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Abstract: Multivariate statistics are often available as well as necessary 
in hypothesis tests. We study how to use such statistics to control not only 
false discovery rate (FDR) but also positive FDR (pFDR) with good power. 
We show that FDR can be controlled through nested regions of multivari- 
ate p-values of test statistics. If the distributions of the test statistics are 
known, then the regions can be constructed explicitly to achieve FDR con- 
trol with maximum power among procedures satisfying certain conditions. 
On the other hand, our focus is where the distributions are only partially 
known. Under certain conditions, a type of nested regions are proposed and 
shown to attain (p)FDR control with asymptotically maximum power as 
the pFDR control level approaches its attainable limit. The procedure based 
on the nested regions is compared with those based on other nested regions 
that are easier to construct as well as those based on more straightforward 
combinations of the test statistics. 
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1. Introduction 

In multiple hypothesis tests, it is common to evaluate nulls with univariate 
statistics. This especially has been the case for tests based on FDR control 
[2, 12, 13, 16, 20, 22, 23]. On the other hand, for hypotheses on high dimensional 
data, such as those in classification or recognition for complex signals, multi- 
variate statistics in general are prerequisite for satisfactory results [1, 6, 24]. 
Such hypotheses each involves a sample of random vectors, from which a multi- 
variate statistic is derived to capture critical features of the sample. Given the 
conceptual appeal of FDR control, it is natural to ask how it can be achieved 
using multivariate statistics. 

The FDR of a multiple testing procedure is defined as E[V/(R V 1)], where 
R is the number of rejected nulls and V that of rejected true nulls [2]. In addi- 
tion to FDR, power and pFDR [20] are two important measures to assess the 
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performance of a procedure. Recall that 

R-V 



power = E 



(n - N) V 1 



pFDR = E[V/R\R> 0], 



where n is the number of nulls, and N that of true nulls. The importance of 
power is well appreciated in the FDR literature [2, 13, 14, 20, 22]. In contrast, 
the issue of pFDR seems more subtle. Oftentimes, as follow-up actions can en- 
sue only after some rejections are made, pFDR is more relevant than FDR. 
However, unlike FDR, in general pFDR is not necessarily controllable at a de- 
sirable level, say below 0.4. The reason is that in many cases, test statistics 
cannot provide strong enough evidence to assess the nulls, especially when the 
data distribution is only partially known and the number of observations for 
each null is small. The controllability of pFDR can strongly affect power. For 
the well-known Bcnjamini-Hochbcrg (BH) procedure [2], if its FDR control pa- 
rameter is below the minimum attainable pFDR, then its power tends to as 
n — > oo [7]. In light of this, power and pFDR should be considered together 
when designing testing procedures. 

A direct way to improve power and pFDR control is to collect more obser- 
vations for each null. However, this may not be feasible due to constraints on 
resources. On the other hand, if the observations can be viewed from differ- 
ent aspects each containing some unique information, then the aspects may be 
exploited together to yield more substantive evidence. 

The approach of the paper is to first establish FDR control based on multi- 
variate p-values, and then evaluate power and pFDR control. Among procedures 
that attain the same pFDR, the one with the highest power is preferred. Sec- 
tion 2 sets up notations and recalls known results. It then gives an example 
to illustrate when multivariate p-values may be useful for pFDR control. Sec- 
tion 3 presents a general FDR controlling procedure which uses an arbitrary 
family of nested regions in the domain of p-values. Then, it shows that if the 
data distribution under true nulls and that under false nulls are both known, 
then the nested regions can be chosen in such a way that the procedure has the 
maximum power among those with the same pFDR while satisfying certain con- 
sistency conditions. However, since full knowledge about data distributions is 
usually unavailable, the emphasis of the section is FDR control based on nested 
regions that approximate the optimal regions. Under certain conditions, the ap- 
proximating regions are ellipsoids under an L e -norm, where e > in general is 
a non-integer. 

Section 4 analyzes the power of the procedure based on the approximating 
regions. It shows that under certain conditions, the power is asymptotically 
maximized as the pFDR tends to the minimum attainable level. The procedure 
is compared with several others, including those that work "directly" on test 
statistics instead of p-valucs, for example, procedures that rejects nulls with 
large L a -norms of the test statistics. It will be seen that only for a < 0, the 
"direct" procedures may attain the same pFDR as the procedure based on the 
approximating regions. The section also considers a procedure based on nested 
rectangle regions in the domain of p-values and shows that it has the same level 
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of pFDR control as the one based on the approximating regions. Although less 
powerful, the procedure is simpler to compute. 

Section 5 considers examples of t and F statistics. Section 6 reports a simula- 
tion study on the procedures considered in previous sections. Section 7 concludes 
with some remarks. Most of the technical details are collected in the Appendix. 

2. Preliminaries 
2.1. Notation 

Denote by K the dimension of a multivariate p- value. Points in M. K will be taken 
as column vectors. With a little abuse of notation, for /: A — ► K with A C M. K , 
sup/ will denote the essential supremum of /, i.e. inf{a: ^(/ _1 (a,oo)) = 0}, 
where £(•) is the Lcbesgue measure. If £n, . . . , are marginal or conditional 
p-values under a null Hi, then ^ = (£n, . . . , Cii<)' w iU be referred to as a mul- 
tivariate p- value associated with Hi. The discussion is under a random effects 
model as follows [10, 13]. Denoting by a G (0, 1) the proportion of false nulls 
and 6i = 1 {Hi is false}, 

are i.i.d. such that 9i ~ Bernoulli(o) and 
given 9i = 0, ...,£,iK are i.i.d. ~ Unif(0, 1), (2.1) 
given 6i = 1, £ s ~ G with density <?. 

The density of ^ i is then 1 — a + ag. The assumption that £n, . . . , are 
independent under true H t should be checked carefully. Generally speaking, it 
should be problem-dependent to design test statistics with independent p- values 
[5]. One situation in which independence may arise is where multiple data sets 
on the same nulls are collected independently following different protocols, e.g., 
with different experiment designs being used or different physical attributes be- 
ing recorded. In this situation, observations in different data sets may not follow 
the same distributions, and hence cannot be combined into larger i.i.d. samples. 
Nevertheless, the p-values derived respectively from them can be combined into 
multivariate p-values with independent coordinates. 

Recall that, for univariate p-values £i, . . . given FDR control parameter 
a G (0, 1), the BH procedure rejects Hi with 

& < t = sup it G [0, 1] : - < m V 1 } , R(t) = #{i : < t}. 

Under the random effects model (2.1), the FDR actually realized by the BH 
procedure is (1 — a) a [3, 11, 22], implying that the FDR can be arbitrarily 
small. On the other hand, the "local FDR" associated with each Hi is [10] 

p(ft=oi&)= 'r^ > 



a <?(£0 1 — a + asupg 



It is not hard to see that the inequality applies to multivariate p-values as 
well. Then, following [8], the next result can be established. 
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Proposition 2.1. Under (2.1), for any multiple testing procedure, 

pFDR > (1 - a)a„, where a* = . (2.2) 

1 — a + a sup g 

Oftentimes, as supg < oo, the pFDR is bounded from 0. In particular, if a is 
small while sup g is only moderately large, the minimum attainable pFDR can 
be undesirably large. This is the basis of the next example. 

2.2. An example 

To further illustrate the role multivariate statistics may have for pFDR control, 
consider tests on Hi : /i 4 = (px,?:, Hya) = f° r ^(m*)^*)- Suppose for each 
Hi, a sample of k i.i.d. (X^, Yy) ~ iV^/ij, Ej) is collected. If it is known that 
E, ; = diag(l, 1) and under false Hi, nx,i = A'v.i = 1; then by Neyman-Pearson 
lemma, among procedures using fixed thresholding, the uniformly most powerful 
one is to reject Hi if and only if + Y is greater than a suitable threshold 
value, where Xi = (1/k) J^j Xij and likewise for Yi. That is to say, in this case 
univariate statistics are the best choice. 

However, in most cases in practice, complete knowledge on data distributions 
is unavailable. If both Ej and fi^ under Hi are unknown, then Xi + Yi cannot 
be used as test statistics and t statistics are called for. Imagine a data analyst 
has computed the t statistics of Xij and those of for each Hi , denoted tx,% 
and ty,i, respectively. While FDR control can be done with either tx,i or ty,i, 
the issue here is pFDR control. 

Suppose the number of nulls is large and due to constraints on resources, 
k = 9 for each Hi. Suppose the data analyst knows that for each Hi, S, is 
diagonal and that for false Hi, fix,i > and fj,y,i > 0. However, he does not 
know that for false Hi, ^x,i/o~x,i = -5 and [Iy^I^ya = -4, where a X i and a Y i 
are the diagonal entries of Ej. If the fraction of false nulls is 5%, then, by using 
tx,i alone, the minimum attainable pFDR is w .289 and, by using ty t i alone, 
the bound is even higher (f=a .447). The lower bounds are a consequence of 
Proposition 2.1. No procedure that only uses tx,i or ty,i can get a pFDR lower 
than the bounds. 

One way to attain lower pFDR is to increase k, which may require signif- 
icantly more resources. When resources are limited, a sensible solution is to 
exploit both tx,i and ty,i or, equivalently, their marginal p-values. This then 
raises the question of pFDR control using multivariate p-values. 

3. FDR control using nested regions of p-values 
3.1. General description 

Let {Dt, 0<t<l}bea family of Borel sets in [0, 1]^ such that 

D 1 = [0,1] K , t(D t ) = t, DsdDt, 0<s<t<l 
{Dt} is right-continuous, i.e., D t = D s> tD s , t e [0, 1). 
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The most familar sets satifying (3.1) are perhaps D t = [0,tj. Let £±, . . . ,£„ be 
the p-values associated with Hi, ... , H n . Define 

n n 

R(t) = 1 «< 6 A} , V(t) = £(1 - t )l {£, G A} • 

i=l i=l 

Description Given FDR control parameter a € (0, 1), 

reject Hi if and only if ^ G _D r , (3-2) 

f r „ ,-, * VI] 

where r = sup U G 0. 1 : - < — — > . □ 

l_ an) 

Theorem 3.1. For procedure (3.2), FDR = (1 — a)a. 

Proof. Since D t is right-continuous. ^ i e D t Sj < t. where 

Si = inf {t G [0, 1] : ^ G A}- (3.3) 

Therefore, procedure (3.2) rejects the same set of nulls as the BH procedure 
applied to si,...,s n does. By i(D t ) = t, s» ~ Unif(0,l) under true A and 
hence Sj are univariate p-values. Theorem 3.1 then follows from [22]. □ 

In general, a nested family of Borel sets in [0, 1]^ can often be parameterized 
so that procedure (3.2) is applicable to them. 

Proposition 3.1. Let {T u , u £ 1} be a family of Borel sets in [0, 1} K , where I 
is an interval in R, such that T u C T v for u < v and {T u } is right- continuous. 
Suppose h(u) := £(T U ) is continuous and strictly increasing with inf h = and 
sup h = 1. For t G (0,1), define D t = F^-iny Also define Dq = DT U and 
D\ = [0, 1]^. Then procedure (3.2) based on D t attains FDR = (1 — a)a. 

As £(D t ) = t, D t will be referred to as the regularization of T u . Since nested 
regions naturally occur as decision regions in hypothesis tests, as seen below, 
by regularization, a test can turn into a FDR controlling procedure. 

Example 3.1. (a) Suppose a test rejects a null if and only if min£fc < u, 
where £ = (£i, . ■ ■ ,£k)' is the associated p- value and u a threshold value. The 
corresponding rejection region is T u = {x E [0, 1} K : mina^ < it}. Then {r^, u G 
[0, 1]} is an increasing family of sets. Since h(u) = 1 — (1 — u) K , procedure 
(3.2) applies to D t = T h -i {t) with h~ l (t) = 1 - (1 - t) x l K . Note that, in the 
Sidak procedure, when K hypotheses are tested simultaneously, /i -1 (£) is the 
significance level for each hypothesis in order to attain familywise significance 
level t. 

(b) Suppose a test rejects a null if and only if Y[ £fc < u , where u > is 
a threshold value. The corresponding rejection region is T u = {x E [0, 1] K : 
l\x k < u}. For K = 2, h(u) = u(l + lnu" 1 ). In general, h{u) = Pfl] U k < u), 
with Uk i.i.d. ~ Unif(0, 1). Since — lnf/fc has density e~ x l{x>0}, h(u) 
I — Fk{— In it), where Fk is the Gamma distribution with K degrees of freedom 
and scale parameter 1. □ 
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For {D t , < t < 1} having a regular representation, procedure (3.2) has an 
equivalent description more amenable to numerical evaluation. Suppose there is 
a function < J < 1, such that D t = {x G [0, 1] K : J{x) < t}. It is easy to see 
Di = [0, 1] K and D t is right-continuous for t G [0, 1). Since Si = inf{i G [0, 1] : 
J{£i) < t} = J{£i), the next description obtains. 

Equivalent Description Given FDR control level parameter a G (0, 1), 

apply the BH procedure to Si = </(£.;). (3-4) 

That is, sort si, . . . , s n into < S( 2 ) < • • • < S(n)- Define sm) = and set 
/ = maxjfc > : s^)/a < k/n}. Then reject Hi if s,; < syy □ 

Example 3.1 (continued) 

(a) Since h(u) = 1 — (1 — u) K is strictly increasing, hr 1 and D t = {x G 
[0, l] x : minxfc < = {a; G [0, 1] A ' : /i(minx fe ) < *}. Therefore J(x) = 
1 — (1 — minxfc)^". 

(b) In this case D t = {x G [0, l] x : /i (II ^fc) < 0. where h(u) = 1-F K (- \nu). 
Then J(x) = 1 - £ lnx fe ). For K = 2, since h{u) = u(l + lnu- 1 ), 
</(x, y) = xy [1 — In x — In j/] . 

3.2. Regions with maximum power under consistency condition 

If the distribution under true nulls and that under false nulls arc known, then, 
in light of Neyman-Pearson lemma, it is natural to ask if FDR can be controlled 
with maximum possible power using the likelihood ratios of the test statistics. 
Some works have been done on this idea [18, 21]. We next show that the idea 
is correct under certain conditions and can be realized by procedure (3.2) with 
an appropriate nested family {D t , t G [0, 1]} C [0, 1] K . 

Let X = (Xi, . . . , Xk)' G M. k be a test statistic. Suppose that under true 
nulls, X ~ Qq with density go and under false nulls, X ~ Q\ with density q\. 
Our construction of D t is based on a familiar transformation of X into multi- 
variate p-values. Denote by fk{x\, ■ ■ ■ ,Xk) the marginal density of X\, . . . , Xk 
under Qq. Clearly qo = fx- 

Lemma 3.1. Let <fr(x) = ((f>i(x), . . . , <f>jc(x))' ' , x G M. , such that 

0i (x) = Q (X 1 < xi), <j)k{x) = Qa{X k < x fe \ X s = x s , s < k), k > 1. 

Let £ = 4>{X), i.e., £i is the p-value of Xi and for k > 1, is the conditional 
p-value of Xk- Suppose i) sppt(qi) G sppt(go), where sppt(g) := {x : q(x) > 0}, 
ii) all fk are continuous and Hi) q\ is continuous on sppt(qo)- Then 1) <p : 
sppt((jo) -* [0, 1} K is continuous and 1-to-l; 2) under true nulls, £i 7 . . . , £k o,re 
i.i.d. ~ Unif(0, 1); and 3) under false nulls, £ has a continuous density g{x) := 
qi(4>" 1 (x)) I qo(4i~ 1 (x)) on E := 0(sppt((Zo)), which is open with £([0, 1]\E) = 1. 
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Condition i) is not restrictive because hypothesis testing is trivial when X = 
x e sppt(gi) \ sppt(go) (cf. [18]). Conditions ii) and iii) are used to make sure 
all the transformations involved on X are still well-defined random variables. 
Under these conditions, any rule based on the likelihood ratio of X has an 
equivalent based on In the rest of the section, we will only consider tests 
on p-values. 

A multiple testing procedure can be regarded as a deterministic or ran- 
dom function 8 that maps each data point to an n-tuple (5i,...,S n ) with 
<5, = 1 {H{ is rejected}. In our setup, the data point is an n-tuplc . . . , £ n ) 
jointly distributed with 9 = [6\, . . . , 6 n ). We need two conditions on S. 

(A) For n > 1, S and 9 arc independent conditional on £ l5 . . . i.e., 
P(S = a\ £ ls . . . , £„, = b)=P(6 = a\£ 1 ,..., £ B ), a, b G {0, 1}" . 

Most multiple testing procedures arc deterministic functions of test statistics 
and therefore satisfy condition (A). The condition means that the observed 
test statistics contain all the available information on 9; if there is any prior 
knowledge on 9, it has already been fully incorporated into £ and hence any 
randomness introduced into 5 is a "pure guess" . 

The second condition imposes some consistency on S. For an n-tuple S = 
( Xl ,...,x n ) with Xi G [0,1]* denote R(S;d) = £™ =1 <5 ? :(S) and F(x;S) the 
empirical distribution function 

F(x; S) = #{i:x lk <x kl k = 1, . . . , K] /n. 

(B) For any sequence of n^-tuples Sk with n k — > oo, if F(x; S k ) converges in 
the sense that sup x \F(x; S k ) — F(x)\ — > for a distribution function F, then 
R(Sk',S)/nk converges in probability. 

Basically, the condition requires that, when d is applied to samples with 
similar empirical distributions, it should reject similar fractions of nulls from 
them. Loosely speaking, that means as far as the fraction of rejected nulls is 
concerned, 5 has to "stick to" a single way of testing, rather than alternate 
between different ways for different data sets. 

For < u < oo, define 

r„ = {x G [0, 1]* : g(x) > «}. (3.5) 

Although {r„} is decreasing instead of increasing, its regularization can be made 
increasing. Let h(u) = £(T U ). Then h is decreasing, h(0) = 1 and h(u) — > as 
u -> oo. Define h*(t) = inf {u > : h(u) < t}. 

Proposition 3.2. Suppose conditions in Lemma 3.1 are satisfied and h 

is continuous. Then D t = ^h*(t) satisfies (3.1). Let a G (0,1). Then, among 
procedures that satisfy conditions (A) and (B) and attain FDR < (1 — a)a, 
procedure (3.2) with D t belongs to those which asymptotically have the maximum 
power as n — > oo. Furthermore, the following statements on the procedure hold: 
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1) it always rejects the same set of nulls as the BH procedure does when applied 
to pi = h(g(^ i )), i = 1, . . . , n; 2) for a > a*, the power is asymptotically positive 
and pFDR — > FDR = (1 — a)a; and 3) for a < a* ; the power is asymptotically 

and pFDR -> (1 - a)a*. 

Example 3.2. To illustrate that in general condition (B) is needed in Proposi- 
tion 3.2, let K = 1. First consider the case where the p-values £i, . . . ,£ n arc i-id- 
- F(t) = (1 - a)t + aG(t) G C^QO, 1]) such that t/F(t) is strictly increasing 
and G(t) is linear on £2], where < ti < t 2 < 1. It is easy to see that t/F{t) 
is strictly concave on [ti,^]- Given c G (0, 1), consider the following randomized 
procedure. Draw U ~ Unif(0, 1). If U > c, reject and only reject nulls with 

G [0, ti]; otherwise, reject and only reject nulls with £j G [0, ia]- As in 00, 
the empirical distribution of £j converges to F. However, conditional on U > c, 
R n /n — > F(ti), while conditional on [/ < c, R n /n — > Ffa)- Therefore, the pro- 
cedure satisfies condition (A) but not (B). It can be seen that pFDR — > (1 — a)a 
witha= {l-c)t-i_/F(t-i)+ct2/F(t2) and power -> (l-c)G(ii)+cG(i 2 ) =G(t c ), 
with t c = (1 — c)ti + ci2- 

Consider the BH procedure when it is applied to £1 , . . . , with control pa- 
rameter a. Since 1/F'(0) < a < 1, by [12], the procedure asymptotically has 
power G{t*), where t* G (0, 1) such that t*/F(t*) = a. Since t/F(t) is strictly 
increasing, t\ < t* <ti. On the other hand, since t/F(t) is strictly concave on 
[iijfe]) t c /F{t c ) > a. As a result, t c > i*. Therefore, asymptotically, although 
the BH procedure has the same pFDR level as the randomized procedure, it is 
strictly less powerful. 

Finally, given c G (0, 1), by small variation to G on [t\, t2\, one can construct 
G which is smooth and strictly concave, such that the above conclusions still 
hold. By Proposition 3.2, the most powerful procedure (3.2) satisfying condition 
(B) in this case is the BH procedure and hence is strictly less powerful than the 
randomized procedure at the same pFDR level. □ 

As noted in a discussion in [4], by either accepting all nulls with probability 

1 — a or rejecting all of them with probability a, it is guaranteed that FDR < a; 
however, the FDR attained in this way is useless, because it cannot say how 
well one can learn from the data being analyzed. Without some coherence of a 
procedure, one can hardly make a sensible evaluation of its performance in a 
particular instance based on a measure defined as a long term average, as the 
measure incorporates not only the way of testing chosen for the data at hand, 
but also others that are potentially very different. The same comments apply to 
pFDR as well. Condition (B) aims to impose some coherence, which is possible 
if the data follows the law of large numbers. This is similar to the ergodicity 
assumption, whereby long term average can be approximated by an average over 
a single large sample. 

The construction in this section requires full knowledge of the density g, 
which is often unavailable. However, if g is known to possess some regularities, 
then it is possible to apply procedure (3.2) to regular shaped Dt with reasonable 
power. This possibility is explored next. 
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3. 3. Nonconstant approximation of lowest order 

In many cases, the following is true for the distribution G of p-values under false 
nulls: 

G has density g e C([0, 1] A ') with g(x) < g(0) < oo for all x ^ 0. (3.6) 

Under (3.6), smaller p-values are stronger evidence against nulls, nevertheless 
the strength is bounded. By Proposition 2.1, the minimum attainable pFDR is 
(1— a)a*, where a* is now equal to l/[l—a+ag(0)]. Following Taylor's expansion, 
suppose for some jk > and e > 0, 



g(x) =s(0)(l -7V + r(x)) with r(x) = o(\x\ e ) as |x| -> 0, (3.7) 



where :r e denotes (xf , . . . , icff)'- ^ ^ s perhaps desirable and expected to be true 
that e is a positive integer. However, under regular conditions, this usually is not 
the case. As will be seen in Section 5, for the upper-tail p-values associated with 
t or F statistics, e usually is a fraction of 1. More generally, g(x) = g(0) (1 — 
S 7k%t k ) +°E x k k ) with £k > possibly different. However, for simplicity, this 
case will not be discussed. 

Rewrite the region in (3.5) as {x g [0, 1} K : g(x) > ff(0)(l - u)}. For < 
u -C 1, the region is approximately {a; G [0, l] x : 7'a; e < u}, suggesting that 
the latter may be used in procedure (3.4) with reasonable power. In general, for 
u = (i/i, . . . , vk )' with Vk > and ^^fe > 0, define r u as 



Then by procedure (3.4), the following procedure obtains. 

Control based on regions (3.8) Given FDR control parameter a £ (0, 1), 



Procedure (3.9) is "scale invariant" in v, i.e., the set of nulls rejected by using 
T u {cu) is the same for c > 0. If K = 1, then the procedure is simply the BH 
procedure and the parameter v = v\ has no effect on its performance. However, 
when K > 1, the power of procedure (3.9) depends on v. To analyze the power, 
denote 



Y u {y) = {xe [0, 1} K : v'x £ < u}. 



(3.8) 



apply the BH procedure to Sj = h(u'^; is), where 



(3.9) 




□ 




The next lemma will be used. 
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Lemma 3.2. Given v = (yi, . . . , vk)' with Vk > 0, let v be the geometric mean 
of Vk, i-e., v = {v\ ■ ■ ■ Vk) 1 ■ Then for < u < min Vk, 



h(u; v) = V e 



K/e 



XI dx = -r^r-r , k = 1, . . . , K. 



IV u {u) 

Furthermore, as u J. 0, 



K + e Vk \v 



[ g= g{0)V s ( 1 - — 



(3.10) 
(3.11) 



:?)(?r+^ /£+1 )- ^ 



3-4- Special cases 

In most cases, h(u;u) is complicated to evaluate. There are two cases that allow 
tractable numerical evaluation of h(u;v). The first case is K = 2. Suppose 
V2 > v\ > 0. For u > v\ + V2, it is clear h(u; v) = 1. For < u < v\ + V2, 



h(u; v) = 



v, t 



nl/e 



V 



r(l/e) 2 fu\V 



2eT(2/e) \v 



(3.13) 



, v* 1 1\ / v* 1 1 
F + - -F 1 ;-,l + - 



where ^* = min(^i, ^2), ^* = max(^i, 1/2) and F(x\ a, 6) is the Beta distribution 
with parameters a and b. See Appendix A.l for a proof of (3.13). 

The second case is V\ = ■ ■ ■ = Vk and e = 1, where h(u; u) can be evaluated 
by recursion. Due to scale invariancc, let Vk = 1. Then 



hx(u) := /i(m 



is piecewise polynomial, such that }ik(u) = hx.y u \ ({ u }) ; where \u\ is the largest 
integer no greater than u, {u} = u — \u\ , and 



A" 



h K ,(t) ■= h K(t + *) = £] -M*, fc)**, « 6 [0, 1). 
Since h K ,K{t) = 1, fejf,o(t) = and, for i = 1, . . . , K - 1, 

n K,i{t) — J hx-i(t — x) dx + J hx-i{t~x)dx 

rt ,1 

hK-i,i{t — x) dx + / /iK--i,i-i(l + 1 — x) dx 

10 Jt 
rt ,1 

h K ^ lti (x)dx+ h K -i,i-i(x). 
10 Jt 



(3.14) 
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It follows that for i = 1, . . . , K — 1 and k = 1, . . . , K, 

( A K (0, 0) = • • • = A K (0, K-l) = 0, A K (0, if) = 

I A K (i, 0) = Ef^o 1 A K-i(i ~ 1, *)/(* + 1), f3 15) 

]A K (i,k) = [A K _ 1 (i,k-l)-A K _ 1 (i-l,k-l)]/k, 
[a k (K,0) = 1, A K (K,k) = 0. 

By hi(u) = u, the initial conditions are Ai(0,0) = 0, Ai(Q, 1) = 1. These 
relations together with (3.14) can be used to compute hxiu). 



4. Analysis of power 

Recall that power = E[ ^ n ^^ vl ], with n the number of nulls and N that of true 
nulls. As our focus is the case n ^S> 1, we shall consider the limit Pow(a) of power 
at pFDR level (1 — a)a asn-> oo. In general, closed form formulas for Pow(a) 
are not available. To get a handle on Pow(a), our approach is to look at how 
fast it drops to as a J, a*, by approximating Pow(a) as a linear combination 
of (a — a*) a , a £ [0, oo), or for that matter, G(D t ) as a linear combination of t a , 
with D t a family of nested regions used by procedure (3.2). Thus, the analysis 
is essentially a type of Taylor's expansion, which can provide useful qualitative 
information for comparing powers of different procedures. 

Our analysis will only yield approximations of low orders. It remains to be 
seen how high order approximations can be obtained. In order to apply the 
results in section 3, we shall assume 

g satisfies (3.6) and (3.7) such that £(T U ) is continuous 

where T u is defined in (3.5). 



4-1- Dependence on parameter values 



For a close to a*, the dependency of the power of procedure (3.9) on v can be 
characterized as follows. 

Proposition 4.1. Fix Vk > 0. Then for procedure (3.9), the minimum attain- 
able pFDR is (1 — a)a* and 



Pow(a) - g(0)V E 



aalg(0) t 



) K/£ as a I a*. (4.1) 



Due to the scale invariance of procedure (3.9), let v\ ■ ■ ■ vk = 71 ■ • • 7.K". Let 
Afe = 7fe/^fe. Then ^P^/k/^k = 1^2^k, which is minimized under the con- 
straint Ai • • • Xk = 1 if and only if Xk = 1. It follows that as a | a», Pow(a) 
asymptotically is maximized if v = 7 and 



supPow(a) ~ g(0)V e 



1 



■IK 



a"*ff(°)7 



K/e 



(a-a*) K/e - 
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Recall that procedure (3.2) based on the regions in (3.5) has the maximum 
power for a > a* among procedures satisfying conditions (A) and (B). The next 
result says that, at v = 7, procedure (3.9) and this procedure are asymptotically 
equivalent, i.e., as a J, a*, they not only have about the same power, but also 
reject about the same sets of nulls. 

Proposition 4.2. For a > a«, let Pow Q (a) be the limit of power of procedure 
(3.2) based on (3.5), and Pow(a) that of procedure (3.9) with u = 7. Then 
Pow Q (a)/Pow(a) — > 1, as a j a*. 

Moreover, let V and T> be the sets of true nulls and false nulls, respectively, 
that are rejected by the first procedure and V and T> those by the second one. 
Let td{oi) be the in-probability limit of \D /\T>\j\D n T>\ and 7v(a) that of 
I V AV|/| V fl V| as n — > 00. Then rr)(a) — > and ry (a) — > as a J. a*. 

4-2. Other types of nested regions 

In order to compare the power of procedure (3.9) and that of procedure (3.2) 
based on other types of nested regions, the following comparison lemma will be 
used, which says that if a nested family of regions can "round up" more false 
nulls, then procedure (3.2) based on the regions has more power. 

Lemma 4.1. Let {Da}, % = 1,2, be two families of Borel sets satisfying (3.1). 
Suppose G(Dn) are continuous in t and there is T £ (0, 1), such that for < 
t < T, G(Dit) < G(£>2t)- Given a € (a*,l), for the procedure based on Da, 
let Ti be defined as in (3.2). Assume that as n — > 00, Tj — > t* G (0,T). Then 
Powi(a) < Pow2(a). 



To start with, for r u (i^) in (3.8), by (3.10), the regularization is D t = T u ^(u) 
with u(t) = v{t/V £ ) e t K if < t < 1. Then by (3.12), 



Example 4.1. A common rule is to reject a null if and only if z = Y[ £fc is small. 
The rejection regions are T' u = {x £ [0, 1] A : JJx k < u}. We next show that 
for okq,, procedure (3.2) based on T' u has strictly less power than procedure 
(3.9) for any v with v\ ■ ■ ■ vk > 0. Roughly, the reason is that, for most 
of is spread around the boundary surfaces Xk = where the density of false 
nulls is lower than that around 0. 

First consider K = 2. By Example 3.1(b), the regularization of T' u is Du = 
{(x,y) G [0, l] 2 : xy < h^ 1 ^.)} with h(u) = u(l + hiit" 1 ). Denote the reg- 
ularization for T u (v) by £>2t- As a J. a*, t* — > 0. Thus, by Lemma 4.1, in 
order to compare the powers for a a; a,, it is enough to compare G(Du) and 
G(D 2 t) for i< 1. Recall J D dxdy = t. Fix a, G (0,7*) and -q > such that 




as t 



0. 



(4.2) 
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G(D lt ) < 5 (0) 



dx dy 



xy<h 1 (t) 
0<x,y<rj 



(a lX £ + a 2 y £ ) dx dy 



= g(0)[t - (en + a 2 )r 1 £ h- 1 {t)/e} + 0{h-\t) 1+£ ) as t -> 0. 

By (4.2), G(D 2t ) = g(0)[t - Ct 1+£ ' 2 ] + o{t l+£ / 2 ) with C > a constant. 
Since h- 1 ^) /t £ ' 2+1 -> 00 as i -> 0, G(_Di t ) < G(£» 2 t) for * < 1. Thus by 
Lemma 4.1, Powi(a) < Pow2(a) for a « a*. Furthermore, the next result 
implies Powi(a) = o(Pow2(a)) as a J, a*. 

Proposition 4.3. Under the setup in Lemma 4-1, 1st D 2 t be the regularization 
of the regions T u (y) in (3.8). If t* — > as a — > a* and 



£>it I {0} and 



g(0)t-G(D lt ) 



M £ [l,oo] as f -> 0, 



g(0)t-G(D M ) 

then Powi(a)/Pow 2 (a) -> (l/A/) K / e as a J. a*. 

The case if > 2 can be treated likewise; see Appendix A. 2. 



□ 



Example 4.2. Normal quantile transformations of p-values have been used as a 
convenient representation of data in multiple testing [9]. Denote <&(x) = P(Z > 
x) with Z ~ A(0, 1). Let Wk > 0, fc = 1, . . . , K . Consider the rule that rejects a 
null if any only if Q(£) = J2k M, fe^ _1 (?fe) is large. The corresponding rejection 
regions are 1^ = {x e [0, l] K : Q(x) > u}. As #(£) - N(0, 1) for £ - Unif (0, 1), 



& i.i.d. - Unif (0, 1) 
= P[J2 WkZk ^ u )= Hu/w) Zi i.i.d. ~ iV(0, 1), 



K 



\k=l 



where w = y/^2 k w 2 .- Then the regularization of T' u is D u = {x 6 [0, 1]^ 
Q(cc) > ui#~ 1 (t)}. In Appendix A. 2, it is shown that 



-H9(0)t-G(D lt )} <b 



tj.o 



hit 



(l + e)AT 



(4.3) 



On the other hand, with I?2t as in Example 4.1, 
Um ln[g(0)t-G(AM)] =1 



t|0 



ln< 



For X > 1. 6 < 1 



=7 A" As a result g(°) t ^ G (- p it) 
as a result, 9 ( )t-G(£> 2t ) 



from Proposition 4.3 that Powi(a) = o(Pow2(a)) as a J, a* 



r/A. 



00 as f J. 0. It then follows 
□ 
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Example 4.3 (Nested rectangle regions). An easy way to get nested regions 
is as follows. Let < /fc(t) < 1 be nondecreasing continuous functions on [0, 1], 
such that Y[ fk(t) = t and fk(t) — ► as t — > 0. Let 

D lt = [0, Mt)] x ... x [0, f K (t)] = {xe [0, l] K : / fe *(z fc ) < t, all k}, (4.4) 

where /^(x) = sup{w : fk(u) < x). Now procedure (3.4) is the BH procedure 
applied to s, = J(^) := max fe By Appendix A.2, 

G(i?it) = i7(0)ri-^^7fc/fc(*) e )t + o(*X]^( t ) e ) as<i °' ( 4 ' 5 ) 

Among all / fe with J] AW = *, (7/7fc) 1/e * 1/K minimize J2lkfk{t) e . Thus the 
maximum of G(D lt ) is g(0)[l - yKt s / K /(l + e)]t + o{t 1+£ / K ). 

Let D-2t be the regularization for Y u (y). For if = 1, £>i t = Z?2t- For if > 1, 
by (4.2), G(£>2t) is asymptotically maximized with value <?(0)[1 — jKLt e ^ K ]t 
if u = 7, where L = (l/V e ) e,K /(K + e). By Appendix A.3, L < 1/(1 + e) for 

> 2. Thus, for a f» a*, the maximum power of procedure (3.2) based on D\t 
is strictly less than the one based on D2t- By Proposition 4.3, 

Powi(a) 1 ( l + e\ K/£ 

e (0, 1), as a | a*. 



Pow 2 (a) V e \K + e / 

Therefore, the power by using the rectangles is of the same order as that by 
using T u (u), albeit lower. 

Procedure (3.2) based on rectangles has some advantages, even though the 
power is not maximum. First, rectangles are much easier to construct than the 
regularization of T u (i>). Second, for g satisfying (3.7), there is no need to know 
e. Indeed, it suffices to try fk(t) = Ckt 1 / K with JJck = 1. By (3.4), the next 
procedure obtains. 



Control based on rectangles Given FDR control parameter a £ (0, 1 

K 

apply the BH procedure to s 



max — 

k c k 



(4.6) 



where c k > satisfy c\ ■ ■ ■ ck = 1. □ 
By Appendix A.2, for procedure (4.6), 

Pow(a)~5(0)(^^/X;^) '\<* -<**)*'* a s a I a*. (4.7) 

In particular, if Cfc = (7/7fe) 1 ^ e , the power is asymptotically maximized. 

Now suppose £ik are upper-tail probabilities of test statistics Xik and fk (t) = 
jX/K _ xhen procedure (4.6) rejects Hi if and only if p$k < t x I k for all k, where r 
is random. If r is small, then procedure (4.6) may be viewed as one that rejects 
Hi with large minfc Xik , which makes it seem unnatural as maxfc Xik is more 
often used in testing. However, it will be seen in Example 4.4 that procedures 
based on max/j Xik in general cannot attain the same level of pFDR control as 
procedure (4.6). 
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4-3. Direct combination of test statistics 

Procedure (3.9) requires p-values of test statistics. Oftentimes, procedures that 
only use simple combinations of test statistics seem more desirable because 
they do not have to evaluate p-values. However, as seen next, in many cases 
such procedures cannot attain pFDR control levels as low as procedure (3.9), 
and hence have strictly less power at low pFDR control levels. 

We only consider test statistics Xi = (Xn, . . . ,Xnc) with Xik being inde- 
pendent under false Hi as well as under true Hi. Let X^ follow fn/c under 
true Hi and Fk under false Hi and suppose Fq^ and Fk have continuous den- 
sities /ofe and fk respectively, such that (0, oo) C sppt(/fc) C sppt(/ofc). Denote 
fk(x) = fk(x) / fok(x) with 0/0 set to 0, and suppose 

rk{x) is strictly increasing on sppt(/fc) but pk '■= lim rk{x) < oo, (4.8) 

x — >oo 

i.e., for each Xik, larger values are stronger evidence against Hi, however, 
the strength is bounded. Notice that, under (4.8), pk > J rk(x)fok(x) dx = 
J f k (x)dx = 1. 

Let £ ife be the upper-tail p- value of X ik . Then £ ik = F k(X ik ) = l — F Q k(X ik ). 
Under false H i: ~ Gk(u) = 1 — Ffc(F ofc 1 (l — u)) with density 



9k(u) = T^TYy where = F~ k \l u). (4.9) 

fok{(pk{u)) 



Then ^ ~ G{x) = f] G k {x k ) with density g(x) = J] 9k(%k)- By (4.8) and (4.9), 
gk are strictly decreasing and <?fc(0) = pk € (l,oo). Then g(0) — YlPk = supg 
and (3.6) is satisfied. If 

9k(u) = g k (0)[l - lk u £ + o{u')}, asw|0, k = l,...,K, (4.10) 

then g{x) satisfies (3.7) and so by Propositions 4.1, the minimum attainable 
pFDR level for procedure (3.9) is (1 — a)a», with a* = 1/(1 — a + ag(0)). In the 
following examples, we shall assume (4.10) holds. 

Example 4.4. One common combination of Xik is Mi = m&x k Xi k . Under 
true Hi, Mi ~ Il^ofc and, under false Hi, Mi ~ Y\F k . By only using Mi, the 
minimum attainable pFDR is (1 — a)/(l — a + a,L), where 

EkM^U^kF^) 

L = SUP : 



T, k fok(x) Ujjtk F 0j( x )' 

In Appendix A. 2, it is shown that L < g(0). Thus (l—a)/(l — a+aL) > (1— o)a*. 

For procedures that reject Hi if and only if Mj is large enough, say, Mi > T, 
where T 3> 1 is a fixed threshold value, the minimum attainable pFDR can be 
even higher. Indeed, now pFDR > (1 — a)/(l — a + aL'), where 

_„ , ^ _ EkFk(x) 



L' = sup — ^— r = (1 + ??t) sup 



*>Tl-II fe ^(z) ' *>TEkF0k(x) 

< (1 + t)t) maxpfc, with rj T — > as T — ► oo. 
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Since maxfe p k can be much smaller than g(0) = YikPk' the minimum pFDR 
can be significantly higher than (1 — a)a*. □ 

Example 4.5. Another common combination of Xik is Mj = ^2 k CkXik, where 
Cfc > 0, such that Hi is rejected if and only if M, is large enough. Without loss of 
generality, let K = 2 and c k = 1. Suppose Xjfc > 0, such that fok{x) ~ x _Sfc and 
/fc(x) ~ pkX~ Sk as i -> oo with > 1. Then <?(0) = pip2- Under true the 
density of Mj is foi(t)fo2(x—t) dt and, under false it is fi{t)f2(x—t) dt. 
Then the minimum attainable pFDR by only using Mi is (1 — a)/(l — a + aL), 
where 



L = sup r(x), r{x) 

x>T 



Jofi(t)Mx-t)dt 

fif0l{t)fa2(x-t)dt' 



with T > a threshold value. In Appendix A. 2, it is shown that L < g(0). As 
a result, the minimum attainable pFDR is strictly greater than (1 — a) a*. 
The same conclusion holds if Mj is a weighted L 9 -norm of Xi with q > 0, 

i.e., Mi = (Efe c fc^ifc) 1/9 - To see this , let c k = !• Under true ~ 
Go fe (x) = F Q k{x 1/q ), and under false H h Xf k ~ G k (x) = F^x 1 /*). Then 
G' ok (x) ~ (l/g)a;- tfc and G' k {x) ~ ( Pk /q)x- tk , where t* = (s fc - l)/q + 1 > 1. 
Then the argument for J^ fe can be applied to Gok and □ 

Example 4.6. Under the same conditions as in Example 4.5, assume further 
that fk(x)/f ak (x) = p k [l - D k x- d « + o{x- d ")}, with D k ^ and d k > 0. Note 
that D k > 0. It follows that as xi, . . . , xk — * oo, 

Therefore, in order to attain pFDR = (1 — a)a with a ~ a*, a null if; should be 
rejected if and only if := ^ZF) k X~ k dk is small. In particular, if dk = d, then 

v~ x ^ d is a weighted L _d -norm of X{ and hence if; is rejected if and only if the 
norm is large. 

Indeed, letting ^ be the upper p- value of Xi k , by the derivations in next 
section, v l = + o k (&k)]vk^k > where v k = D k (s k - l) £fc , e k = d k /(s k - 1) 
and Ofc(it) — > as it — > 0. Consequently, if e k = e, then the above procedure can 
be formulated as one based on £ i; such that the associated rejection regions are 
approximately T u (y) in (3.9) when contracting to 0. Provided the rcgulariza- 
tions of the rejection regions are readily available, they can be used in procedure 
(3.2) for FDR control as well, with approximately the same power as procedure 
(3.9) as a | a*. □ 



5. Examples of special distributions 

We next show t and F distributions satisfy (3.6) and (3.7). To this end, some 
general formulas are needed. Suppose Xi, . . . ,Xk are test statistics that are 
independent not only under true nulls but also under false nulls. Let Fo k and 
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fok be the marginal distribution and density of X k under a true null, and F k 
and fk those under a false null. 

Suppose that, for some positive constants a, b, c kl d k and r k , 1 < k < K, 

f ok (x) ~ ac k x~ a ~ 1 (5.1) 

l k ^\ =r k - d k x~ b + o(x- b ), as noo. (5.2) 

JOk{X) 

Then, as u J, 0, -0A;( U ) := ^oifc — u) ~ (cfe/w) 1 / . Let be the upper-tail 
p- value of X k and <7fc(w) its density. By (4.9) and (5.2), as u { 0, 

g k (u) = r k - d k ^ k {u)- b + o(M^r b ) = r* - C - b/a d fc ^ /a + o(u 6 / a ), (5.3) 

implying g k (0) = r k . By independence, £ has density g(u) = Y[9k{u k ) which 
satisfies (3.6) and (3.7). Moreover, the parameters in (3.7) are 

e = b/a, g(0) = ri • • • r K , = c k ~ £ d k /r k . (5.4) 

Note that for F ok = N(Q,a 2 ) and F k = N{^,a 2 ), where ^ > and a is 
known, (5.3) does not hold, as supg/c = supfk/fok = oo. For simplicity, we next 
only consider K — 1 and omit the index k. 



5.1. t distribution 

Let F = t Pj $, the noncentral t distribution with p dfs and noncentrality param- 
eter 5 > and Fq = t p = t p ,o- The density of t Pj g is 



t P ,s{x) 



Ae- &2 ' 2 



(p + x 2)(jp+l)/ 



k=Q 



P + X J 



k/2 



pP/ 2 F((p+l)/2) 1 fp + l + k 

where A = v^npm ' Cfe = fc! r 



p+ i 



By C = 1, <p(x) = AC (p + .t 2 )-(p +1 )/ 2 - Ax^ 1 as x -> oo. Let 
1 — xj \J p + x 2 . Then z ~ (p/2)x~ 2 and 



t p (x) 



J2Ck(V2S) k (l-z) k 



k=0 



-8 2 /2 



Y^ Ck {V28) k - ^Y.kC k {V25) k 



Lfc=0 k=l 

Consequently, in (5.1) and (5.2), a = p, b = 2, c = A/p, 



o{z). 



-S 2 /2 



V 



J2 kC k {V25) k < oo, r = e- &2 > 2 C k (V25) k < oo. 



k=l 



fe=0 
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Then by (5.4), 



b 2 1 
a p 2 



pV^T(p/2) 

r«P + i)/2) 



2/p 



^fcC fe (^) fc / J2Ck(V2S) k 



k=l 



k=0 



Clearly, for p > 2, e is a fraction of 1. 



5.2. F distribution 



Let F = Fp >g! s, the noncentral F distribution with (p, g) dfs and noncentrality 
parameter S > and Fq = F Pi<? = -Fp, 9 ,o- Letting p = p/q and 2 = 1/(1 + 
F Pjgj< 5 has density 

x 1 j ^ o k!B(p/2 + k,q/2y 

where B(x,y) = T(x)T(y)/T(x + y). The density of F P:9 is / p , g (x) = f p , 9 ,o(x). 
As a? — ► oo, z ~ (l/p)x~ 1 . It follows that 



fp,q(x) 



-q/2 



B{p/2, q/2)x ) B(p/2,q/2Y 



-9/2-1 



fp,q,s(x) _ -5/2 ^p,q,k ( 5 



fp,q{x) 



where C 



fc=0 



a -<5/2 



E 



fc! 



Cp,9,fc I 5 



Cp,9,fc /<5 X 



p,<J,fc 



Lfc=0 

B(p/2,g/2) 
B(p/2 + fc,g/2)- 



px^(fc-l)! V2 



o(z), 



Thus in (5.1) and (5.2), 

a = q/2, 6=1, c 



2p 



-9/2 



-9/2 



« = 2^ t— ^-^ttt I - I < oo, r = e 



P £(*-l)lV2 
and by (5.4), e = 6/a = 2/g 



qB(jp/2,q/2) B(p/2,q/2+l) 

k oo „ / e\ fc 



- P -<V2 



E 



Cp,9,fc / <5 



fc! 



2 <0 ° 



7 



f> f P ? , tM 2/<? Cp,g, k (6/2) k /^C p 



Clearly, for g > 2, e is a fraction of 1 



fc! 
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6. Numerical study 

We report a simulation study on the procedures described in previous sections. 
The simulations are implemented in R language [17]. We focus on testing mean 
vectors of multivariate normals with only partial knowledge about their vari- 
ances. Model (2.1) is used to sample H\, . . . ,H n , such that Hi is = 
in iV(/i i5 £<)" and l{Hi is true} ~ Bcrnoulli(a) with a = .05 or .02, where 
fi, G R K . We mimic the situation where the only available information is 1) 
under true Hi, Sj is diagonal and 2) under false Hi, all the coordinates of fi i 
are positive. Since Sj are not necessarily the same for different Hi and there is 
no knowledge on their relations whatsoever, Sj cannot be estimated by pooling 
the observations. Since £, are unknown and the values of \x i under false Hi are 
also unknown, the tests have to rely on i-statistics. In the simulations, for each 
Hi, an i.i.d. sample Xn, . . . , X^df+i ~ N(fi i} E<) is drawn and the test statistic 
is computed as (T a , . . . ,T lK )' , where T ik = \ZdT+lX ik / S ik , with X ik and S ik 
the mean and standard deviation of the fe-th coordinates of Xy . It follows that 
for true H { , T ik ~ t n and for false H it T ik ~ idf, v / df+WM fc > where [i ik is the 
fc-th coordinate of fi t and cr^ the k-th. diagonal entry of Sj. The corresponding 
p-value is = . . . ,£nc)', with the marginal upper-tail p-values of Ti k 
under t n . 

6.1. Bivariate p-values 

In this part, we compare procedures (3.9) and (4.6). Throughout, K = 2 and 
FDR control parameter a = .15. For true Hi, N(fi i ,Y,i) = N(0,Ik) and for 
false Hi, N(fi i ,T li ) = N(fi, £(r)), where the diagonal entries of S(r) are equal 
to 1 and off-diagonal entries equal to 2r/(l + r 2 ). For r = 0, the coordinates of ^ 
are independent under false Hi. To examine the effect of dependency between 
the coordinates of £ s under false Hi, we also simulate with r = ±1/5. Each 
simulation makes 2000 runs, each run tests 5000 nulls. The (p)FDR and power 
are computed as Monte Carlo averages of the runs. 

We conduct 3 groups of simulations, corresponding to (£t,n) = (.6, .2, 8), 
(.5, .5, 8) and (2,2,2), respectively (Table 1). In each group, procedures (3.9) 
and (4.6) are implemented for 6 pairs of a and S(r), with a = .05, .02 and 
r = 0,±l/5. For the pairs with r = 0, a* = 1/[1 — a + ag(0)] is calculated using 
(5.4) and the results in Section 5.1. The value of a* as well as those of e = 2/n, 



Table 1 

Parameters of the simulations in Section 6.1. The values of «*, 7 and e = 2/df are computed 
for the case where the coordinates of p- value are independent 





/x, df 


a„ 




(a = .05, .02) 




T 


e 


1 


(.6, .2), 8 


9.37 x 10" 3 


.18, .47 


2.31 x 10" 2 


.36, .69 


(5.82, 3.51) 


1/4 


2 


(.5, .5), 8 


9.05 x 10~ 3 


.30, .30 


2.23 x 10 -2 


.52, .52 


(46.81, 46.81) 


1/4 


3 


(2, 2), 2 


2.88 x 10" 2 


.44, .44 


6.90 x 10" 2 


.67,.67 


(27.69, 27.69) 


1 
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71 and 72 are given in Table 1. As is seen, for all the pairs a = .15 > a*, so 
pFDR = (1 — a)a is attainable by the procedures. On the other hand, if only 
the «th (i = 1,2) coordinate of the p-value is used, pFDR = (1 — a)a is not 
attainable because in this case the minimum attainable pFDR is (1 — a)a*i with 
a*i > .15. 

For r = and a w a*, by Proposition 4.1 and (4.7), procedures (3.9) and (4.6) 
approximately reach their respective maximum power at pFDR level (1 — a)a 
if Vk = 7fc in (3.9) and Ck = Cj, := (j/jk) 1 ^ in (4.6). To see how the powers 
depend on i>k and Cfc for a = .15, the procedures are tested with 



where s > is a tuning parameter. The reason why s e instead of s is used for v 
will be seen later. For r — ±1/5, the procedures are tested with the same sets 
of values v and c as well. For groups 1 and 2, (3.13) is used to calculate h(u; u) 
for procedure (3.9). For group 3, as e = 1, (3.15) is used. 

The plots of power and (p)FDR vs log 2 s are shown in Figures 1-3 and labeled 
with "e" and "r" for procedures (3.9) and (4.6), respectively. The label "e" refers 
to "ellipsoid" , due to the similarity of the nested regions in procedure (3.9) to 
Euclidean ellipsoids. For most of the plots, a = .05. The results for a = .02 are 
qualitatively the same, except that the power is lower and the pFDR is harder 
to control. For illustration, Figure 1 includes the plots of power and (p)FDR for 
(fi, df) = (.6, .2, 8), r = and a = .02. 

The results show that for a significantly greater than a* , the power may still 
exhibit patterns similar to that for a m a*. First, by Proposition 4.1 and (4.7), 
for a ~ a*, the maximum power of procedure (3.9) is strictly greater than that 
of (4.7). The left panels of Figures 1-3 show that this remains to be the case 
for a = .15. Second, for v and c as in (6.1), as a ~ a*, for both procedures, 
the power is approximately proportional to (s E + l/s e )~ K / e . As a result, the 
power curves of the procedures should be approximately symmetric, decreasing 
in I log 2 s\ and parallel to each other. This holds quite well for a = .15, except for 
the plots for procedure (4.6) in Figure 1, which exhibit moderate asymmetry that 
may be attributable to the unequal marginal distributions of the coordinates of 
the p-values. 

In (6.1), it is necessary to use s £ to tune u in order to get a power curve 
parallel to the one for procedure (4.6). For t distributions with df > 2, e < 
1, suggesting that procedure (3.9) is more sensitive to the change in v than 
procedure (4.6) is to the change in c. However, since e is known, as the results 
show, the sensitivity is easy to address. For df = 2, e — 1 and hence the power 
of procedure (3.9) is uniformly greater than that of procedure (4.6) (Figure 3). 

The results also demonstrate the difference between pFDR and FDR. In the 
simulations, the FDR remains constant. However, as power decreases, the pFDR 
increases, sometimes quite rapidly. Unless power is high enough, the pFDR is 
strictly greater than the FDR. Note that this observation is made when the 
number of tested nulls is 5000. In theory, if power is positive, then pFDR — > FDR 



v = (yxyV-i) = (s e 7i,72/s e ) for procedure (3.9), 
c = (ci,C2) = (sci, cj; / s) for procedure (4.6), 



(6.1) 
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FDR(e) 
pFDR(e) 
FDR(r) 
pFDR(r) 




Fig 1. Power and (p)FDR vs logs for procedures (3.9) and (4.6): 
Rows 1-3, a = .05, r = 0, 1/2, -1/2. Row 4, a = .02, r = 0. 



2 4 6 
roup 1 (cf. Section 6.1). 
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-6 -4 -2 2 4 6 -6 -4 -2 2 4 6 



Fig 2. Power and (p)FDR vs logs for procedures (3.9) and (4.6): group 2 (cf. Section 6.1). 
a = .05, r = 0,1/2,-1/2. 

as the number of nulls tends to oo. The observed discrepancy between the pFDR 
and FDR is due to the fact that the number of nulls is not large enough for the 
asymptotic to take effect. 

Finally, as seen from Figures 1 and 2, statistical dependency between the 
coordinates of the test statistics may have significant influence on power and 
pFDR control. Nevertheless, the modality and symmetry of the power curves are 
quite stable. Furthermore, the effects of correlations are not obvious in Figure 
3, where e = 1 and the joint distribution of the p-values is symmetric. This 
apparent stability remains to be explained. 



Z. Chi/FDR with multivariate p-values 



390 



0.8 




-6 -4 -2 2 4 6 -6 -4 -2 2 4 6 



Fig 3. Power and (p)FDR vs logs for procedures (3.9) and (4.6): group 3 (cf. Section 6.1). 
a = .05, r = 0,1/2,-1/2. 



6.2. Comparison with other procedures 

In this part, we take K > 2 and compare procedures (3.9) and (4.6) with the 
methods in Examples 4.1, 4.4 and 4.5. The first method rejects Hi with small 
Y\ k £ik, the second one rejects Hi with large max/. Tn~ and the third one rejects 
Hi with large J^k Tik- We refer to the methods as "by-product", "by-max" and 
"by-sum" , respectively. 

The basic setup in this part is as follows. For true Hi, iV(/Hj, Ej) = N(0,Ik) 
and for false H u N^, E,) = N(fi, E(r)), where E(r) = [1 + {K - l)r 2 ]" 1 M / M 
with Mjk = 1 {j = fc} + l {j 7^ k} r. The diagonal entries of E(r) are therefore 1. 
As in Section 6.1, FDR control parameter a = .15, a = .05, .02 and r = 0, ±1/5. 
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Table 2 

Parameters of the simulations in Section 6.2. The values of a*, 7 and e are computed for the 
case where the coordinates of p-values are independent. 





/x, df 


a* ( 


a 


.05, .02) 




7 


e 


1 


(.5, .65, .8), 4 


8.58 x 10 




2.12 x 10 


-2 


(3.52,4.93,6.52) 


1/2 


2 


(.6, .7, .8, .9, 1),2 


3.25 x 10 


■ 


8.09 x 10 


-3 


(4.47,5.48,6.57,7.76,9.04) 


1 


3 


(.8, .8, .8, .8), 2 


1.76 x 10 


-2 


4.29 x 10 


-2 


(6.57,6.57,6.57,6.57) 


1 


4 


(.6, .6, .6, .6), 3 


9.57 x 10 


-3 


2.36 x 10 


-2 


(4.27,4.27,4.27,4.27) 


2/3 


5 


(2, 2), 2 


2.88 x 10 


-2 


6.90 x 10 


-2 


(27.69,27.69) 


1 


6 


(1.5, 1.5), 3 


9.73 x 10 


-3 


2.40 x 10 


-2 


(16.16, 16.16) 


2/3 


7 


(2,3, 2), 10 


9.4 x 10" 


1!) 


2.4 x 10" 


18 


(39.91,82.27, 39.91) 


1/5 



We conduct 7 groups of simulations with the values of (fi, n) given in Table 2. 
Each simulation makes 3000 runs, each run tests 6000 nulls. The (p)FDR and 
power are computed as Monte Carlo averages of the runs. 

For K > 2, unless e = 1, the evaluation of h(u; u) in procedure (3.9) is rather 
difficult. To get around this problem, by (3.10), we approximate the procedure 
by replacing h(u;v) with V E {u/v) K / e for all u <G [0, 1]. A more difficult issue is 
how to compare the procedures and the methods. One idea is to compare their 
powers at the same pFDR level (1 — a)a. However, by Examples 4.1, 4.4 and 4.5, 
for the values of (fi, df) in Table 2, except for the 7th one, no method attains 
pFDR < a = .15. For this reason, we choose to examine the (p)FDR levels of 
the methods when they have the same power as either procedure at pFDR level 
(1 — a)a. The steps are as follows. Take the by-product method and procedure 
(3.9) for example. Suppose the latter rejects D false nulls when applied to 
If D > 0, then sort flfe^fc m increasing order, keep rejecting the sorted nulls, 
starting from the first one, until D false nulls are rejected; if D = 0, then reject 
no null. In this way, the number of rejected true nulls of the by-product method 
is minimized while the number of rejected false nulls is the same as procedure 
(3.9). 

In each group, for each combination of a and £(r), procedures (3.9) and (4.6) 
are simulated with v^, = 7^ and c% = ('j/jk) 1 ^ e , which are approximately the 
parameter values yielding maximum power for a ps a*. For each procedure, 
the by-product, by-max and by-sum methods are compared to it in the way 
described above. The results are reported in Tables 3-6. In groups 1-6, procedure 
(3.9) has more power than (4.6), often with a large margin. In all the cases where 
both procedures are able to control the pFDR around (1— d)a, the methods have 
substantially higher FDR and pFDR when their powers are matched to that of 
procedure (3.9) or (4.6). The results show that the methods either cannot control 
the pFDR at the level of (1 — a)a (which is indeed the case) or, alternatively, 
they can only control the pFDR with much lower power than procedures (3.9) 
and (4.6). 

Unlike groups 1-6, in group 7, each coordinate of the vector of t-statistics 
provides strong evidence to identify false nulls. By only using the 1st or 3rd 
coordinate, the minimum attainable pFDR is 2.4 x 10 -5 for a = .05 and 6 x 10 -5 
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Table 3 

Simulation results for groups 1 and 2 described in Section 6.2. In each group, the results arc 
organized according to r = 0, 1/5, —1/5. The numbers in the rows for the "by-product" 
method are its FDR and pFDR as its power is pegged to procedure (3.9) or (4.6). The 
numbers in the rows for the "by-sum" and "by-max" methods are likewise. 



Group 1 


a = .05 

Proc. (3.9) Proc. (4.6) 


a = .02 

Proc. (3.9) Proc. (4.6) 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.223 .155 
.141, .141 .142, .142 
.250, .250 .204, .204 
.307, .307 .283, .283 
.654, .654 .633, .633 


7.70 x 10 -2 5.49 x 10 -2 
.150, .160 .144, .162 
.250, .268 .212, .241 
.415, .444 .383, .435 
.718, .768 .660, .749 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.351 .316 
.142, .142 .142, .142 
.297, .297 .264, .264 
.333, .333 .310, .310 
.721, .721 .709, .709 


.210 .181 
.148, .148 .147, .148 
.327, .327 .289, .289 
.447, .447 .427, .428 
.836, .836 .826, .827 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


4.61 x 10 -2 3.12 x 10 -2 
.143, .156 .148, .165 
.194, .213 .174, .197 
.296, .326 .284, .321 
.512, .564 .485, .548 


1.10 x 10 -2 9.28 x 10 -3 
.141, .261 .150, .291 
.114, .230 .114, .250 
.217, .440 .204, .447 
.327, .663 .299, .654 


Group 2 


Proc. (3.9) Proc. (4.6) 


Proc. (3.9) Proc. (4.6) 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


486 329 
.142, .142 .143, .143 
.337, .337 .242, .242 
.538, .538 .520, .520 
.809, .809 .792, .792 


268 1 73 
.147, .147 .147, .148 
.395, .396 .313, .313 
.734, .734 .737, .739 
.902, .902 .892, .894 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.580 .539 
.142, .142 .142, .142 
.441, .441 .389, .389 
.588, .588 .567, .567 
.852, .852 .845, .845 


.457 .412 
.146, .146 .147, .147 
.510, .510 .447, .447 
.744, .744 .731, .731 
.926, .926 .923, .923 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.316 .138 
.143, .143 .142, .143 
.291, .291 .221, .223 
.546, .546 .568, .572 
.785, .785 .763, .767 


3.91 x 10~ 2 2.78 x 10~ 2 
.150, .198 .145, .201 
.211, .287 .189, .272 
.558, .759 .523, .752 
.620, .845 .578, .830 



for a = .02, and by only using the 2nd coordinate, the value is even lower. 
As Table 6 shows, procedures (3.9) and (4.6) identify all the false nulls. Since 
almost all the p-values of false nulls are smaller than those of true nulls, due 
to how the by-product method is implemented, it rejects very few true nulls 
and hence has near-zero (p)FDR. The same is true for the other two methods. 
The pFDR of procedure (3.9) is significantly lower than (1 — a)a, because the 
approximation we use for h(u; is), i.e. V e (u/v) K / e , is strictly greater than h(u; u) 
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Table 4 

Simulation results for groups 3 and 4 described in Section 6.2. 



Group 3 


a = .05 
Proc. (3.9) Proc. (4.6) 


a = .02 

Proc. (3.9) Proc. (4.6) 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.272 .197 
.142, .142 .143, .143 
.328, .328 .279, .279 
.570, .570 .572, .572 
.789, .789 .781, .781 


9.15 x 10" 2 6.76 x 10" 2 
.150, .158 .149, .162 
.344, .363 .303, .330 
.733, .775 .713, .776 
.833, .880 .802, .873 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.444 .409 
.142, .142 .142, .142 
.404, .404 .368, .368 
.584, .584 .573, .573 
.829, .829 .824, .824 


.291 .257 
.149, .149 .150, .150 
.455, .455 .416, .416 
.749, .749 .745, .745 
.913, .913 .910, .910 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


4.89 x 10" 2 3.57 x 10" 2 
.145, .158 .143, .160 
.234, .256 .215, .243 
.574, .628 .558, .629 
.680, .744 .651, .733 


1.22 x 10~ 2 1.01 x 10" 2 
.151, .268 .151, .284 
.158, .308 .151, .321 
.388, .757 .357, .759 
.417, .813 .381, .809 


Group 4 


Proc. (3.9) Proc. (4.6) 


Proc. (3.9) Proc. (4.6) 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.170 .105 
.142, .142 .144, .144 
.285, .285 .227, .228 
.440, .441 .433, .434 
.761, .761 .746, .747 


5.69 x 10" 2 3.75 x 10" 2 
.145, .161 .144, .170 
.282, .316 .232, .279 
.571, .639 .524, .629 
.763, .854 .693, .832 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.357 .322 
.142, .142 .143, .143 
.363, .363 .320, .320 
.456, .456 .433, .433 
.820, .820 .812, .812 


.240 .207 
.147, .147 .147, .147 
.411, .411 .358, .358 
.612, .612 .594, .594 
.907, .907 .903, .903 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


4.26 x 10" 3 4.02 x 10" 3 
.145, .294 .149, .300 
.102, .231 .105, .239 
.217, .492 .217, .496 
.273, .618 .272, .621 


2.36 x 10" 3 2.21 x 10" 3 
.149, .492 .160, .530 
5.58 x 10" 2 , .278 6.30 x 10~ 2 , .335 
.119, .594 .119, .635 
.143, .712 .139, .737 



when u > min^fe and hence inflates Si in (3.9). This causes the BH procedure 
to reject more nulls with Si not very close to 0. As these nulls are exclusively 
true nulls, the resulting (p)FDR is lower. 

Finally, in order to see how procedures (3.9) and (4.6) perform when the 
parameters v and c are not set to their respective asymptotically optimal values, 
we simulate groups 1 and 2 again, with v = c = (1, 1, . . . , 1). As Table 7 shows, 
across the simulations, for each procedure, the power is lower than in Table 3 
but not dramatically while the (p)FDR is quite stable. The (p)FDR levels of 
the other 3 methods tend to be lower than in Table 3. As in group 7, this can 
be explained by how the methods arc implemented. 
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Table 5 

Simulation results for groups 5 and 6 described in Section 6.2. 



Group 5 


a = .05 

Proc. (3.9) Proc. (4.6) 


a = .02 

Proc. (3.9) Proc. (4.6) 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.772 .733 
.143, .143 .143, .143 
.267, .267 .246, .246 
.387, .387 .373, .373 

rr o e cr o e tr Tn tr ifr\ 

.585, .585 .57U, .57U 


.372 .342 
.149, .149 .148, .148 
.280, .281 .271, .271 
.529, .530 .527, .528 

/tit cno c n 1 £?no 

.697, .698 .691, .692 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.773 .755 
.142, .142 .143, .143 
.283, .283 .272, .272 
.403, .403 .395, .395 
.607, .607 .599, .599 


.432 .411 
.149, .149 .148, .148 
.295, .295 .288, .288 
.535, .536 .532, .533 
.716, .717 .712, .712 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.767 .731 
.142, .142 .142, .142 
.266, .266 .247, .247 
.386, .386 .374, .374 

ro/i ro/i tr 7n tr Tr» 

.584, .584 .570, .570 


.369 .344 
.147, .148 .148, .149 
.281, .282 .272, .273 
.529, .532 .527, .529 
.697, .700 .691, .695 


Group 6 


Proc. (3.9) Proc. (4.6) 


Proc. (3.9) Proc. (4.6) 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.776 .722 
.143, .143 .143, .143 
.239, .239 .207, .207 
.295, .295 .269, .269 
.540, .540 .513, .513 


.497 .447 
.146, .146 .147, .147 
.259, .259 .237, .237 
.391, .391 .377, .377 
.654, .654 .639, .639 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.771 .747 
.142, .142 .142, .142 
.258, .258 .240, .240 
.315, .315 .301, .301 
.571, .571 .557, .557 


.538 .510 
.148, .148 .149, .149 
.278, .278 .263, .263 
.410, .410 .399, .399 
.685, .685 .676, .676 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.773 .711 
.143, .143 .143, .143 
.233, .233 .200, .200 
.288, .288 .262, .262 
.530, .530 .501, .501 


.476 .425 
.148, .148 .148, .148 
.257, .257 .235, .235 
.388, .388 .375, .375 
.646, .646 .631, .631 



7. Discussion 

7.1. Role of p-values 

We have followed the tradition of using p-values for hypothesis testing. The 
general procedure in the work, i.e., (3.2), utilizes the fact that the p-value of a 
continuous multivariate statistic can be defined in such a way that its coordi- 
nates are i.i.d. ~ Unif(0, 1). The interpretation of p- value as a measure on how 
"rare" or "suspiOcious" an observation looks is irrelevant, even though in many 
cases smaller p-values are indeed more likely to be associated with false nulls. 
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Table 6 

Simulation results for group 7 described in Section 6.2. "— " means value equal to the 
nonmissing value in the same row. 





a = 


.05 


a = 


.02 


Group 7 


Proc. (3.9) 


Proc. (4.6) 


Proc. (3.9) 


Proc. (4.6) 


Power 


1 


1 


1 


1 


FDR, pFDR 


.103, .103 


.142, .142 


.118, .118 


.146, .146 


By-product 


o, - 




o, - 




By-sum 


1.2 x 10~ 6 , - 




5.0 x 10" 6 , - 




By-max 


3.1 x 10" 3 , - 




5.4 x 10~ 3 , - 




Power 


1 


1 






FDR. pFDR 


.103, .103 


.143, .143 


.118, .118 


.148, .148 


By-product 


1.2 x 10" 6 , - 




o, - 




By-sum 


3.6 x 10~ 6 , - 




1.3 x 10~ 5 , - 




By-max 


5.7 x 10" 3 , - 




9.6 x 10 -3 , - 




Power 


1 


1 


1 


1 


FDR, pFDR 


.103, .103 


.143, .143 


.119, .119 


.148, .148 


By-product 


o, - 




o, - 




By-sum 


o, - 




2.7 x 10~ 6 , - 




By-max 


2.9 x 10~ 3 , - 




4.9 x 10" 3 , - 





Thus, in this work, p-values serve as a mechanism to "flatten" the probability 
landscape of true nulls and hence facilitates exploring subtle differences between 
true and false nulls. 

Since what essentially matters to procedure (3.2) is nested events with specific 
probabilities, it can be easily modified to directly handle test statistics instead 
of their p-values. Indeed, in (3.2), D t £ [0, 1]^ can be replaced with nested E t in 
the domain of the test statistics, such that P{E t ) = t under true nulls. Analysis 
on the power of the modification might yield some useful insight. For example, 
weighted L 2 norms are commonly used as criterion for acceptance/rejection. 
However, as shown in Examples 4.5 and 4.6, in more challenging cases, one may 
need to consider L p norms with p < 0. On the other hand, the modification does 
not simplify the testing problem, as probabilities still have to be evaluated. 
Nevertheless, as remarked next, the notion of using nested regions in spaces 
other than [0, 1]^ is useful. 

7.2. Incorporating discrete components 

Often times, test statistics have nontrivial discrete components. For example, 
test statistics for different nulls may have different dimensions or degrees of 
freedom. In this case, the discrete component may be expressed as a scalar. 
However, if the test statistics are multivariate but only partially observed, then 
the discrete component in general have to be set-valued accounting for observed 
coordinates. Procedure (3.2) can be modified as follows. Suppose Z is the dis- 
crete component of test statistic T such that for any z, the conditional dis- 
tribution of T given Z = z has a density and is K(z) dimensional. Then, in 
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Table 7 

Simulation results for groups 1 and 2. The setting is similar to that in Table 3, except that u 
in (3.9) and c in (4.6) are set equal to (1,1,..., 1) instead of according to -y. 



Group 1 


a = .05 

Proc. (3.9) Proc. (4.6) 


a = .02 

Proc. (3.9) Proc. (4.6) 


Power 
FDR, pFDR 
By-product 

Rv. on m 
i_} y □ Liiii 

By-max 


.206 .138 
.143, .143 .143, .143 
.Z4U, .z4U .191, .191 
302 302 278 278 
.650, .650 .626, .627 


6.94 x 10~ 2 4.88 x 10~ 2 
.146, .159 .144, .163 
.zoo, .z57 .zUo, .zol 
40^ 441 ^78 4^0 
.700, .762 .657, .746 


Power 
FDR, pFDR 
By-product 

I-^t" c 1 1 m 
u y -t> liiii 

By-max 


.340 .293 
.143, .143 .143, .143 
.286, .286 .242, .242 
39R 39R 9QR 9Qfi 

.717, .717 .701, .701 


.199 .166 
.147, .147 .146, .146 
.315, .315 .271, .272 

443 443 41 Q 49D 

.835, .835 .823, .824 


Power 
FDR, pFDR 
By-product 

T-S"\7" C 1 1 TY~I 
JJV £> U.111 

By-max 


3.78 x 10~ 2 2.55 x 10~ 2 
.140, .155 .142, .166 
.175, .196 .154, .182 

984 318 9fi7 31 R 

liOti .JlO --111. .OIU 

.495, .554 .453, .537 


1.06 x 10~ 2 8.55 x 10~ 3 
.146, .279 .144, .289 
.107, .227 .108, .247 

910 44R 1QR 448 

.313, .666 .286, .654 


Group 2 


Proc. (3.9) Proc. (4.6) 


Proc. (3.9) Proc. (4.6) 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.458 .300 
.142, .142 .142, .142 
.317, .317 .226, .226 
.533, .533 .519, .519 
.806, .806 .788, .788 


.248 .155 
.147, .147 .148, .148 
.378, .378 .296, .297 
.734, .734 .739, .741 
.901, .901 .889, .892 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.562 .508 
.142, .142 .142, .142 
.420, .420 .353, .353 
.581, .581 .554, .554 
.850, .850 .840, .840 


.441 .388 
.145, .145 .145, .145 
.483, .483 .412, .412 
.738, .738 .723, .723 
.925, .925 .920, .920 


Power 
FDR, pFDR 
By-product 
By-sum 
By-max 


.251 .112 
.141, .141 .141, .142 
.266, .266 .207, .209 
.551, .552 .574, .580 
.778, .780 .757, .765 


3.34 x 10~ 2 2.35 x 10~ 2 
.149, .204 .146, .212 
.200, .281 .176, .264 
.537, .755 .500, .754 
.595, .838 .550, .829 



(3.2), redefine D t as a nested subsets in the disjoint union of [0,1]^, such 
that Y. z *(An [0, l] K[z) )p z = t, where p z is the probability of Z = z under true 
nulls. The analysis in previous sections still works and requires no substantial 
extra changes. 

An apparently simpler alternative is to conduct separate tests on statistics 
with different values of the discrete components. This alternative fails to take 
into account the distribution of the discrete components and hence may have 
lower power. 
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7.3. Power optimization 

When the distribution under false nulls is only partially known, it can be a 
difficult issue how to attain maximum power. To see this better, consider testing 
the null "/x = 0" for N(fi,I) based on a single observation X. As the variance 
is known to be /, the most powerful test statistic would be v 1 ' X , provided that 
the true value v of fi under false nulls is known. However, when u is unknown, 
unless there is strong evidence on its whereabouts, one has to search in a large 
region of fi to improve the power, which becomes more difficult as the dimension 
of v gets higher. 

One way to improve power is to restrict the search to parametric families 
of nested regions. This is the approach taken in Section 3.3. If the parameter 
involved is of high dimension, some type of stochastic optimization [19] may be 
needed. On the other hand, regions that attain maximum power may consist of 
several disconnected regions, which makes it difficult to use a single parametric 
family of nested regions to approximate them. An alternative way therefore is 
to try different families of nested regions at different locations in the domain of 
p-values and combine the results appropriately [7]. 



Appendix 

In this section, we shall denote I = [0, 1] 



A.l. Theoretical details for Section 3 

Proof of Proposition 3.1. Since Dq = and D\ = I, it suffices to show that D t , 
t e (0, 1) satisfy (3.1). Observe that hr 1 is continuous and strictly increasing 
on (0, 1). Then, as T u is right-continuous, D t is right-continuous. It is clear that 
D t is increasing and l{D t ) = t{T h , {t) ) = h{h*(t)) = t. □ 

Proof of Lemma 3.1. 1) The following "sandwiched convergence" is needed: if 
< a n (x) < b n (x) such that a n (x) — ► a(x), b n (x) — ► b(x) a.e. and J b n — ► J b < 
co, then J a n — * J a. For each k, denote by g(xi, . . . , Xk) 

fk(xi, . .. , Xk-i, z)dz = / fk(xi, . . . , Xk-i,z)l {z < x k } dz. 

-oo J 

The function in the second integral is dominated by fk{x\ 1 . . . , Xk-i, z). If (xi 
,. . . , Xfc) — ► (yi, . . ., yk), then, by the continuity of fk and /fc_i, 

fk(xi, ■ ■ .,x k -i,z)l {z < x fe } -> fk(yi, ■ ■ .,Vk-i, z) 1 {z < Vk} , for z ^ y k 
fk(xi, x k -i, z) -> f k (yi, yk-i,z), 

/ fk(xi, . . ,,Xk-i,z) dz = f k -i(xi, . . . ,x fc _i) -> / fk(yi, ■ ■ ■ ,2/fc-i, z)dz. 
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By the sandwiched convergence, g(x\, . . . , Xk) — ► g(yi, ■ • ■ , Thus, g is con- 
tinuous. As (f> k (x) = g(xi, . . . ,Xk)/fk-i(xi, ■ ■ ■ ,x k -i) for x G sppt(g ), 0fc(a;) 
is continuous in sppt(go)- Therefore, G C(sppt(go))- 

Let x, y £ sppt(go)' Suppose x ^ y such that Xj = j/; for i < k and Xk < Uk- 
Then <pk{x) < 4>k{y)- Assume equality holds. Then 

= [<t>k{v) - <t>k{x))fk-x(x 1: . . .,x k -x) 

q (xi, . . .,Xk-i, z,Uk+i, ■ ■ - ,Uk) dzduk+i ■ ■ -du K - 

Since qo is continuous, the above formula implies qo{xi, • ■ ■ , Xk-i, z, itfc+i, ■ ■ ■ , 
uk) = for z G [xk,Vk] an d Wfc+i, • • • , u K G M, in particular, q a (x) = q (y) = 0. 
The contradiction implies <f>k(x) < 4>k{y) and so 4>{x) ^ <fr{y). 

2) Let X ~ Qq. Then P(X G sppt(g )) = 1. Since <j> G C(sppt(q )), £ = 
4>{X) is a well-defined random variable. For x G sppt(go)> by 1), conditional on 
Xi = Xi, i < k, Xk has a continuous distribution and hence ~ Unif(0, 1). 
Since the conditional distribution of is the same regardless of x\, . . . , Xk-i, 
£k is independent of X\, . . . , Xk-i and thus independent of £i, . . . , £k-i- This 
gives £ ~ Unif(7). 

3) Let X ~ Qi. As in 2), £ = 4>(X) is a well-defined random variable. Denote 
r = qi/q Q . For t G M K , 

=^q„ [e 4t '«r(X)] = £ Qo [e lt '« rfa" 1 (£)) 

where the first equality holds since sppt(gi) C sppt(go) and r(-X") is a well- 
defined random variable due to r G C(sppt(go))- Since is 1-to-l and continuous 
onsppt(go), E := </>(sppt(q )) is open and G C(J5) [15]. As £(I\E) = Q (£ 
E) = 0, r{4>^ 1 (x)) is Borel measurable on / and by 2), the last expectation 
equals jjC lt u r(<f>~ 1 (it)) du. Thus, the characteristic function of £ under Q\ is 
the same as that of a random variable with density r(0 (u)), u G I. Since 
r(0" 1 (w)) G C(£7), this proves 3). □ 

To show Proposition 3.2, we need a few preliminary results. Recall pi = 

Haiti))- 

Lemma A. 1.1. Let h be continuous. Then 1) Pi < t ^ G D t and 2) under 
the assumptions of Lemma 3.1, 

„. . \t if Hi is true .„ „ . 

I (j\L>t) if Hi is false 

and G(D t ) is strictly concave. 

Proof. It can be seen that Pi is a well-defined random variable and for t G [0, 1], 
h(h*(t)) = t. Thenp 4 < t <S=S> > G D t . Under true 

P(Pi < t) = l(D t ) = t. Under false H u P{ Pl < t) = f D g = G(D t ). Given 
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< ti < t-2 < £3 < 1, let u k = h*(tk). By the continuity of ft, t k = h(u k ) and 
ui > it 2 > it 3 . As D t = T u 



G{D tk+1 ) - G{D tk ) 
tk+i — tk 



1 



h(u k +i) - h(u k ) 



Since ft(w fe ) = ^(r„J and ff(sc) £ (itfc+i,Ufc] on T Uk+1 \ T Uk , u k+1 < r k < u k . As 
a result, r\ > T2- Therefore, the distribution of pi is strictly concave. □ 

Lemma A. 1.2. Let r]i, . . . ,rj n be independent Bernoulli random variables such 
that pi = P{rji = 1) are decreasing. Let S = r)i + • • • + T] n . Then E[r]i /(SV 1)] is 
decreasing. 

Proof. Let i < j. Then rji, rjj and A = S — rji — rjj are independent, giving 



E 



5V1 



= p l E 



Vj 



X 



E 



SVl 



= PjE 



X 



Since p.i >Pj, (l + f]j + X) 1 stochastically dominates + 
Pi E[(l + r)j + A)- 1 ] > Pj E[(l + Vl + A)- 1 ]. 



-A)" 1 . Therefore, 
□ 



Lemma A. 1.3. Let s n , r/i, . . . , r] n G [0, 1] be jointly distributed, such that s n — ► 
s £ [0, 1] as n — > oo and T)i are i.i.d. ~ F. Let F n be the empirical distribution 



ofm,- 
F*(s). 



,rj n . If F is continuous and strictly increasing on [0, 1], then F*(s n ) 



Proof. Recall sup|F„ - F\ A 0. Let x n = F*(s n ) and x = F*(s). Since F is 
continuous, s = F(x). Suppose x £ (0, 1). Given e S (0, x), by F n (x n — e) < s n , 

{x„ > x + 2e} C {s n > F n {x + e)}. By s n ^ s = F(x) and F n (a; + e) ^> 
F(x + e) > P({x„ > x + 2e}) — * 0. On the other hand, {x n < x - e} C 

{s n < F n (x - e)}. By F n (x - e) £ Ffz - e) < P({x„ < x - e}) 0. 



Therefore, x„ — * 0. The case where x = or 1 is similarly proved 
Proof of Proposition 3.2. First, since h € C is decreasing, C\ s >tDs = H 



□ 



s>fr/i«( s ) 



n 8 >t{s(a:) > ft* (a)} = n s>t {ft( 5 (a;)) < s} = {%(*)) < *} = > 
ft*(i)} = D 4 , proving the right-continuity of D t . By the continuity of ft, £(D t ) = 
^h-(t)) = H h *(t)) = t. The rest of (3.1) is easy to check. 

Denote by N the number of true nulls, and for any given procedure, denote 
by R and V the numbers of rejected nulls and rejected true nulls, respectively. 
We shall show i) procedure (3.2) with D t satisfies conditions (A) and (B) and 
attains FDR = (1 — a) a; ii) the search for procedures with maximum power can 
be restricted to those that reject and only reject nulls with largest <7(£j); and 
iii) for such a procedure, R/n converges in probability to a nonrandom number. 
From these results, the proof will follow without much difficulty. 
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i) By Lemma A. 1.1 1), procedure (3.2) with D t = Th»(t) is the same as the BH 
procedure applied to pi, ■ ■ ■ ,p n - Therefore, statement 1) holds and FDR = (1 — 
a) a [2, 22]. Since the set of rejected nulls is uniquely determined by £ 1; . . . , £„, 
the procedure satisfies condition (A). 

Recall a* = l/(l-a + asup.g). By Lemma A.l.l 2), F(t) = (l-a)t + aG(D t ) 
is strictly concave. Then 



= lim 



1 



1 



«^ S up S 1 -a + aG(r u )/£(T u ) F'(0)' 

For a G (a*, 1), t/a — F(t) has a unique positive solution e (0, 1). Since 

p 

Pi are i.i.d. ~ F, by [12], for r in (3.2), r — > t* as ri — ■> oo and procedure (3.2) 
asymptotically has the same power as the one that rejects Hi with pi < t*. By 

the law of large numbers, R/n — ► F(t*). On the other hand, for a < a*, by [7], 
p 

R/n — > 0. In either case, condition (B) is satisfied. 

ii) Given £],,.■.,£„, • • • j $n a- re independent Bernoulli variables with 



Sort into rm > r( 2 ) > 



'(2) 



m ' l-a + ag&Y 

> rr n ). Let <5 be a procedure satisfying condi- 
tion (A). Then R = E"=i ^ and V = £" =1 (1 - 0;)^. By condition (A), 0* is 
conditionally independent of (Si, R) given . . . , £ n . Then 

n 

B[7|€i I ...,€ TlJ iJ]=X;s[(i-fl 4 )«ilCi,--..€„ > fl] 

i=l 

n i? 

= ]T(1 - rJEfa . ,€ nI i?] > ^(1 - r w ), 

i=l i=l 

where the last inequality is due to i? = X)"=i Then 



FDR = E 



E[V\^,...,$ n ,R] 



RVl 



> E 



R 



771 E(l-r W ) 

i=l 



with equality being true if rejected nulls are exactly those with the i? largest ri. 
On the other hand, since N = JTJiLi by Lemma A. 1.2, 



R — V 



7VV1 

n 
i=l 

i=l 



J2 E 



r ^ 






TV V 1 


[ *w 






_NVl 



7VV 1 
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where 9n\ corresponds to the null with the ith largest r^. Then 





' R 








power < E 










NVl 






.1=1 







Note that r, > r 3 g(| ?; ) > g{£j) Pi < Pj- Construct procedure 6' 

which first applies S and then, provided d rejects R nulls, rejects nulls with the 
R smallest pi instead. It follows that 1) if 5 has FDR < (1 — a) a, then so does 
6'; 2) 6' is at least as powerful as 8; 3) if d satisfies condition (A), then, as the 
second step of 5' is uniquely determined by £ lt . . . , £ n , 6' satisfies condition (A) 
as well; and 4) since 8 and 5' reject the same number of nulls, if one satisfied 
condition (B), the other does as well. 



iii) Let S satisfy conditions (A), (B) and attain maximum power asymptot- 
ically while having FDR < (1 — a)a. As n — > oo, the empirical distribution 
of £ lf . . . ,£„ converges to the distribution that has density 1 — a + ag(x). By 

condition (B), R/n — > some i* £ [0,1]. Let F n be the empirical distribution 
p 

of ft. Then F n — > F. Since F is strictly concave and continuous, F is strictly 
increasing. By Lemma A. 1.3, F*(R/n) — ► F*(s). Since 5 rejects and only rejects 
nulls with pi < F*(R/n), S is asymptotically equivalent to a procedure which 
rejects and only rejects nulls with pi <t*=F* (s). If t* > 0, by the law of large 
numbers and dominate convergence, as n 



oo, 



FDR = (l + o(l))E 
power = (1 + o(l))E 



#{i:pi<*„fli = 0} 



#{z : Pi <h}Vl 
#{i: pi <U, 0i = 1} 



#{i :0i <U} VI 



(1 - o)i* 



In order to attain maximum power while maintaining FDR < (1 — a)a, t* has 
to be the largest value of t satisfying t/F(t) < a. It is easy to see that for 
a G (a*,l), t* is the unique positive solution of t/a = F(t). Combined with 
part i) of the proof, this shows procedure (3.2) with Dt = can be taken 

as 8. Furthermore, in this case, power — > G(Dt,) > and since P(R > 0) — > 1, 
pFDR = (1 + o(l))FDR (1 - a)a. Thus 2) is proved. 

On the other hand, for a < a*, no t > satisfies t/F(t) < a. As a result, = 
0. Thus, the power of S is asymptotically and procedure (3.2) with D t again 
can be taken as 5. Furthermore, by [7], the procedure has pFDR — > (1 — a)a*. 
This proves 3). 



□ 



Proof of Lemma 3. 2. By change of variable Xk 
minz/ fc , 



{u/vkf^Zk, for < u < 



h(u; v)=fl Wx* < u} dx = J 1 4 < l} I] (y k 



dz, 
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which yields (3.10). Likewise, 

Jx%l{u'x s <u} dx = ^- (%) K/e+1 /^{E* ^ X } dz - 
By symmetry, 

J 41 {J24 < l} dz = I jf ($>l) 1 {E 2 ^ ^ !} 

By 14 (u) := ^({z S : Zfe > 0, z k — u }) = Kr u ^ £ and change of measure, 
the right hand side equals (1/K) J^tdV E (t) = V £ /(K + e). It is then easy to get 
(3.11). Finally, by (3.7), 

[ g = g(0)lh(u;u)-y27k[ x%dx)+g(0)[ r. 

JT^{u) \ JT u {u) J JY u (v) 

As u — > 0, supp^^-j \x\ — > 0, implying r(a;)/|a;| e — > uniformly on T u (i/), and 
by (3.11), J r , v r = o(u K / e+1 ), which together with (3.10) yields (3.12). □ 

Proof of (3.13). Let v\ < v 2 without loss of generality. For < u < v\ + v 2 . 
h(u; v) is equal to 

f 1 f 1 fu-vix E \ 1,e 
I 1 {v\X e < u — v 2 } dx + 1 {u — v 2 < v\x s < u} dx. 

Jo Jo \ V2 J 

The first integral on the right hand side equals [{u — v-^/vi] 1 ^ if u > v 2 and 
otherwise. By variable substitution z = v\x € /u, the second integral equals 

i(j^) 1 " r / "i{ 1 -i2 <z < 1 uv.- 1(1 _, )1 /. & 

£ \V1V2J J I U J 



r(l/e) 2 / u 



2 / „,2 \ 1/e 



2er(2/e) \v1V2 



1 ( — Al) ~f(Ji- — ) 



where -F is the Beta distribution function with parameters 1/e and 1/e + 1. 
Since F(x A 1) = F(x) and F(x V0)= F(z), this yields the proof. □ 



A. 2. Theoretical details for Section 4 



Proof of Proposition J^.l. Denote h(u) = h(u;i>) and D t the regularization of 
Y u (u) defined in (3.8). Let Zi = i/£f. Under true Hi. since ^ ~ Unif(J), 
P(Zi < u) — h(u). Since h is continuous, h(Zi) ~ Unif (0, 1). On the other hand, 
under false H h P{h{Zi) < t) = P(v'ti < h*(t)) = G{T h - Ht) (v)) = G(D t ), so 
by (4.2), the density of h(Zi) at is 5(0). Because procedure (3.9) is the BH 
procedure applied to h(Zi), it follows that its minimum attainable pFDR is 
(1 — a)a*. 
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To get (4.1), let F(t) = (1 — a)t + aG(D t ) and the maximum solution to 
F(t) = t/a. Then 

G(A.)= f — -- + 1^*= (- —+.9(0)) U. 

\aa a J \aa a*a / 

Replacing the left hand side by (4.2), it is seen that as a [ a*, 
(t/V E f/ K D ^ 7fc a -a, 



ETfc 



K + e ^— ^ i/fc aa^g(0) 

Clearly, as a | a*, i* — > 0. If G{D t ) is strictly concave, then by [12], Pow(a) = 
G(D t ,) ~ <7(0)t*, which, combined with the asymptotic of £», proves (4.1). 
However, it is not clear whether G{D t ) is strictly concave in general. To get 
around the problem, we use the following argument. Let r = t„ be defined as 
in (3.2), where n is the total number of nulls. The goal is to show that, given 
< T) < 1, as < a — a* < 1, 

P((l - rj)U < T n < (1 +r})U) -> 1, asn^oo. (A.l) 

If this holds, then it is easy to see that G{Di 1 _ n \ u ) < Pow(a) < G(-D( 1+7) ) t< ). 
As G(-D(x±77)i» ) ~ (1 + 7 7)ff(0)^* an d ?? is arbitrary, (4.1) then follows. 
The remaining part of the proof is for (A.l). By (4.2), 

ffl = l-a + ^ = i-C^ + 0( ^ )f as^O. 
f t a* 

where C > is a constant. By this expansion, there is < S -C 1, such that 

• , F(s) F(t) F(s) F(t) 

mf — i-i > — — , sup — — < — — , for < i < S. 

8<(X-ri)t S t (l+r,)t<s<S s t 

By g £ C(I) and g(x) < g(0) for x ^ 0, for i > 0, G(D t )/t < g(0), yielding 
F(i)/£ < 1/a*. Thus, for < a - a* < 1, sup t > 5 F(t)/t > 1/a. On the other 
hand, i* € (0, <5). Consequently, 

• , F(s) 1 F(s) 1 

mf — — > — , sup — — < — . 

s<(i-T7)t. s a (i +n )t,<s<i s a 

Because the empirical distribution of h(Zi) converges to F in probability, the 
above inequalities imply (A.l). □ 

Corollary A. 2.1. For procedure (3.4) using Y u {y), Hi is rejected if and only 
ifv'j^l < Q, where £ = £„ is a random variable such that given < 77 <C 1, for 
< a — a* *C 1, -P(|C — f*| < ? 7' y *) ~~ * 1 as n ^ 00, where 

v. ~ G(i/)(a - a*), with C{u) = -^tL / £ 2* . (A.2) 
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Proof. Let £ = h*(r), where h(u) = h(u; v) and r is as in the proof of Proposi- 
tion 4.1. Then B l is rejected h(v'£ s ) < t <^=> i/£f < C- Let v» = h*(t,). 
Since = v{t/V £ ) e ' K for < i <C 1, the result follows from the asymptotics 
of t and i* . □ 

Proof of Proposition 4-2. For ease of notation, integration over x will be implic- 
itly restricted in I. First consider Pow (a) of procedure (3.2) using D t = I\»( t ), 
where r u = {x e / : g(cc) > u} and = £(T U ). By Lemma A. 1.1, G(D t ) is 
strictly concave. Then by [12], Pow Q (a) = G(T Ut ), where u* = h(t*), with i„ 
the unique positive solution to (1 — a)t + aG(D t ) = t/a. Therefore, 

(1 - a)£(T Ut ) + aG(T u , ) = £{T U , )/a. (A.3) 

Using (A.3) followed by a* = 1/(1 — a + ag(0)), 

a .9(°) J W x£ - r(x)] dx = [l-a + ag(0) - 1/a] J dx (A.4) 

dx. (A.5) 



Fix 77 > 0. For a > a*, denote 

u = w(a) = 1 — u*/g(0), A = a — a*. (A. 6) 

Then r u „ = {x 6 / : 7'a; e — r(:c) < u} and w J, as a J, a*. By 7^ > and 
r(x) = o(|a;| e ), for < a— a* <C 1, 7'a; e < (1 + rj)v and |r(a?)| < T]j'x £ on r„. . 
Together with (A.4) and (A.5), this gives 

(l + 77)afl<(0) f j'x £ dx> — I dx, 

J-f>X E <(l + T])v aCt * J~f'x E <(l-T])v 

(l-rf)ag(0) [ j'x s dx < — — ( dx. 

J -y' x E <(l—7j)v aa * J -f' x E <{l+-q)v 

By Lemma 3.2, the inequalities imply 

o(l + t]) K / £+2 Kv > A(l - 7}) K I Z a(l - f]) K / £+2 Kv < A(l + y]) K l e 



K + e a*ag(0) K + e a*ag(0) 

Since 77 is arbitrary, it follows that 

(K + e)A 



KaaftgQO) ' 

Comparing with (A. 2), v ~ C(7)A. 
Applying (A.3) followed by (3.10), 



Pow (a) 



as a I a*. 





1 












/ da; 




a 







9(0) f dx~ g(0)V e 

J-y'x'<G(-l)\ 



1 (if + e)A 
PKaa 2 g(0) 



K/e 
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On the other hand, by Proposition 4.1, Pow(a) has the same asymptotic. 
Then Pow (a)/Pow(a) -> 1. 

Let -d = C(7)(a — a*). The above argument shows that, for procedure (3.2) 
based on r u , if < a — a* <C 1, then Hi is rejected if only if 7'^ — r(£ 4 ) < Co, 
where Co = Co,n is a random variable satisfying P(|Co — $\ < ?/"$) -> 1 as ru 00. 
On the other hand, by Corollary A. 2.1, for procedure (3.9) based on T u (-f), Hi 
is rejected if and only if 7'^f < £, where C satisfies P(|C — #| < t?^}) — > 1 as 

71 — > CO. 

Recall V = {true fli : Y£f - r(&) < Co} and V = {true : Y£f < Q. 
Since r(x) = o(~f'x £ ), by the asymptotics of Co and C, given < 77 <C 1, as 
< a — a* < 1, 

P({true flj : 7'^ < (1 - 77)7?} C V n V) -> 1 
P(V AV C {true H, : (1 - 77)7? < 7 '£f < (1 + 77)7?}) -> 1. 



It follows that 



ry(a) < dx / dx 

J(l-?7)iS<7'a!! e <(l+?))iJ / ■/-y'x e < 



As 77 is arbitrary, this gives ?v(a) — * as a J, a*. Likewise, r^>(a) — ► 0. □ 
Proof of Lemma 1^.1. By the assumptions of the lemma and [12], 

Powj(a) = G(Dit'), (A.7) 
where t- = sup{i : (1 - a)t + aG(D it ) > t/a} 

and furthermore, t* < T. Since G(D lt ) < G(D 2 t) for t < T and both arc 
continuous, it is seen that t* < tJj. On the other hand, 

G(At« )=( — -- + l) ** ~ fl(0)tj , as a a.. (A.8) 
\aa a / 

As a result, Powi(a) < Pow 2 (a). □ 
Proof of Proposition 4-3. By (A.8), 

jj _ g(giq) _ Powi(q) t\ _ g(0)tl - GjDu'J 

t* ~ G(D 2 t* ) ~ Pow 2 (a) ^ t* ~ g(0)t* - G(D M .)- 

Suppose M < 00. Then by the last equality and (4.2), as a j a*, 



*S ff (0)i5 - G(D 2t .) \t* 2 

giving t\/tl ~ [\/M) K / £ and the proof. The case M = 00 is likewise. □ 
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Proof for Example 4-1 (Case K > 2). Define D\ t and D 2 t as in the case K = 2. 
Fix r\ > and a k g (0,7*.), so that g(x) < g(0)(l - J2 a kX%) on [0,rj\ K . Then 



G{D lt ) < 5 (0) 



a*. 



a;? da; 

all 2^6(0, r/) 



where s = exp(— 2^(1 — i)), with fjj the Gamma distribution function with 
K degrees of freedom and scale parameter 1. We need to evaluate 

/, -I dx = V K+£ L.. XK<r 4 d X = ^E[Un {UV < r}], 

all Xi£(0,r]) all 2^6(0,1) 

where r = s/r/ K , U ~ Unif (0, 1) and V are independent, and V is the product of 
Ui, . . . , Uk-i i.i.d. ~ Unif (0, 1). By transformation X = — In U and Z = — In V ', 
the expectation equals 



/•— In r 

A(r) := / e 
Jo 



-(l+e)(- lnr-x) 



/oo 
- hi r 



Recall that for n > 1, as x — ► oo, 

l-F„(a;)~a: B - 1 e-7(n-l)! 
As t — > 0, s — > and hence r — > 0, yielding 



(A.9) 



A(r) 



„l+e 



■ In r 



- K - 2 e ex dx + r 1+£ 



r(lnr ) 



l\K-2 



(K-2)\ ■ 



So for t < 1, G(D lt ) < g(0)[t - CsQns- l ) K -% where C > is a constant. On 
the other hand, by (4.2), G(D 2t ) > g(0)[t - C"i 1+£ / K ], where C > is another 
constant. By (A.9), for some constants Ci,C2 > 0, 



sflns- 1 ) 



-l\it-2 



sOns" 1 ) 



-i\i<:-2 



Cl 



t i+e/K (1- F K (- In s)) 1 + s / K s^ K (-\ns) c 

As a result, for t < 1, G(D lt ) < G(D 2t )- 



□ 



Proof for Example 4.2. Recall # _1 (s) ~ ^/21og(l/s) as s j 0. Let s k = Wk/w. 
Given c > 1 and a k € (0,7fc), there is < r\ -C 1, such that 1) <y(a;) < <?(0)(1 — 
a'aj) for x e (0, 77 ^ and 2) letting 

B t := ja; G (0,77)^ : ^ s k ^log(l/x k ) > CA /log(l/i)| , 



for < t < 1, B t C (0, n D lt . Then 
G(D lt )<g(0) / da;- / a'a;da; 

UBk JB t 



9(0) 



K 



t-^a k / if da; 

fc=i 
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J x e dx = \\nt\ K J 



y fc G(i5t,oo) 



As t i 0, hit -► -oo and <5 t | 0. So by £ sf = 1, 
1 



In 

and hence 



x e j dx -> inf [yi H h + ej/,-] = c 



l + e 



— \n[g(0)t -G(D lt )} < 



tj.o 



hit 



lim - — In [ at / a:| <te | 



1 



k tio Int 



mm lim t — In ( / xidx \ = c 2 



1 - 



l + e k 



Since c > 1 is arbitrary and min^ s k < 1/K, (4.3) then follows. 
Proof of Equations (4-5) and (4-7). By (4.4), as 1 J, 0, 



□ 



I0<x k <f k (t) 



(1 - j'x e + r(x)) dx 



G{D lt )=g{Q) [ 
Jo< 

= 9(0) (t-i±^2'»A(*) 1+e II/«(*) 

Since l\fk{t) = t, (4.5) then follows. 

Let f k (t) = c k t x l K . By (A.7), (A.8) and (4.5), Pow(a) - g(0)t* asm a*, 
where t* > is the solution to 

1 - a + ag(0) (l - ^-j- £ 7fc c|i £/A ') = 1/a. 

Since 1/a* = 1 — a + a.g(O), 
. e /K _ 1 + £ ( 1 1 ' 



E 



l + e 
a« 2 ff(0) y 



^TfcCfe I (a™ a*). 



a^(0) \a* a 
It is then easy to see (4.7) holds. 

Proof for Example 4-4- We need to show L < g(0) = H fe Pfe, where 

LAWn j# ^M . A (*) II,.* ^ (») 

_L = sup — — - — — — — — — - < max sup ■ 



□ 
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Let ip k = F k /F ok . By F k (x) = f*^ r k (t)f k{t) dt and the assumption (4.8) on 
j-fc, for any finite c, sup 2 , <c ipk(x) < Pk- Since tpk(x) is continuous and tends to 
1 as x — ► oo, it follows that sup^, ipk(x) < Pfc- Then for each k, 

T ^(xjn^F.tx) = s ? rfe(x) n II" 

Thus L < g(0). □ 

Proof for Example 4-5. We need to show L := sup x>T r(x) < g(0) = p\p 2 for 
T > 0, where 

%h{t)f 2 {x-t)dt 

r(x) - 



lo foi(t)f<a{x-t)dt' 

As /fc(x) < Pkfok(x) for any a; > with f k {x) > 0, it is seen r(x) < g(0) 
for any x > 0. It remains to be shown j-(oo) < <?(0). Once this is done, by the 
continuity of r(x) on (0, oo), L < g(0). Given c, A € (0, 1), 

i~ [ hi(t)f 02 {x-t)dt = h + h 

Jo 



where I\ is the integral over [0, cx] and I 2 over [ax, x]. As x 3> 1, for t € [0, cx], 
Ax" S2 < fm(x-t) < A-^l - c)- S2 x- S2 . Then 



Ax" 82 y /oi(*)dt < h < A- X (l - c)- S2 x" S2 / f i(t)dt. 
Similarly, by using /oi(x) ~ x~ Sl , 

\x~ Sl [ f 02 (x-t)dt < h < A _1 c" Sl x- Sl / f 02 (x-t)dt 



As x -> 00, J ca: /01 -> 1 and f* x f 02 (x -t)dt = J Q (1 c)x /02 -> 1- Therefore, if 
Si > S2, then i2 = °(^i) an< i hence / ~ I\. Since c and A arc arbitrary, / ~ x _S2 . 
Likewise 

J-= I fl(t)f 2 (x-t)dt~ p 2 X~ S \ 

Jo 

and hence r(x) —> p 2 . Similarly, if s\ < s 2 , then r(x) — > pi. If si = S2, write 
J = I1+I2+I3, where the integrals Ii are over [0, cx], [cx, (1— c)x] and [(1— c)x, x] 
respectively, with < c <C 1. Following argument similar to the above, I\ ~ 
h ~ x" Sl while 7 2 = 0(x" 2si ). Then / - 2x~ Sl . Likewise, J ~ (pi + p 2 )x~ Sl . 
Thus r(x) — > (px + p2)/2. In any case, r(oo) < .9(0). □ 



Z. Chi/FDR with multivariate p-values 



109 



A. 3. Some facts about V E 

Let V E! k(u) = L- • • Jq 1 1 "E x 6 k < u} dx and V e ^K = V^if (1). It is not hard to 
see that V e ji(u) = u K / e V E j< for u G [0, 1]. It can be shown that 

(l/e) K - 1 T(l/e) K ,„ , 

V <* = KT(K/s) ' A = 1 ' 2 '"" 

This is clear for A" = 1. For AT > 1, by first integrating x%,..., xk-i, 

V £ , K = [ V e ,K-l{l- X e K )dx K = V e ,K-l I (l-X^-V^dx 

Jo Jo 

= V e , K -l*\ [ 1 #/e-l {1 _ t) <.K-l)/e dt 
= V e , K -l X 



(A-i)r(i/ £ )r((A~i)/ £ ) 

eKT{K/e) 



Then (A. 10) follows by induction. 

Finally, in Example 4.3, it is claimed that - < for A' > 1. To 

show this, write t = l/e and H(t) = ln[tr(i)/(l + t)% Then by (A.10), the 
above inequality is equivalent to KH(t) > H(Kt), K > 1. It is not hard to get 
lim t ^o+ H(t) = 0. Therefore, if one can show H(t) is concave for t > 0, then 
the desired inequality is obtained. Now 

H"(t) = (kr(t))" 



i 2 (1 + i) 2 1 + *' 

It is known that (lnr(t))" = E$*L ( fc + '> 1 > °- Sincc V( fc + < V( fc ~ 
1 + i) - l/(fc + t) for fc > 2, then it is seen that H"(t) < for t > and hence 
if (£) is strictly concave. 
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